Spring AI - Quick Reference
Full Reference: See advanced.md for image generation, multi-modal/vision, advisors/middleware, testing patterns, and prompt templates.
Deep Knowledge: Use mcp__documentation__fetch_docs with technology: spring-ai for comprehensive documentation.
Dependencies
<!-- OpenAI --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency>
<!-- Azure OpenAI --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId> </dependency>
<!-- Ollama (local) --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency>
<!-- Vector Store - PGVector --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency>
Configuration
OpenAI
spring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o temperature: 0.7 max-tokens: 1000 embedding: options: model: text-embedding-3-small
Azure OpenAI
spring: ai: azure: openai: api-key: ${AZURE_OPENAI_KEY} endpoint: ${AZURE_OPENAI_ENDPOINT} chat: options: deployment-name: gpt-4o temperature: 0.7
Ollama (Local)
spring: ai: ollama: base-url: http://localhost:11434 chat: options: model: llama3 temperature: 0.7
Basic Chat
@Service @RequiredArgsConstructor public class ChatService {
private final ChatClient chatClient;
public String chat(String message) {
return chatClient.prompt()
.user(message)
.call()
.content();
}
// With system prompt
public String chatWithContext(String message) {
return chatClient.prompt()
.system("You are a helpful assistant specialized in Spring Boot.")
.user(message)
.call()
.content();
}
// With parameters
public String chatWithParams(String message, String topic) {
return chatClient.prompt()
.system(s -> s.text("You are an expert in {topic}.")
.param("topic", topic))
.user(message)
.call()
.content();
}
}
ChatClient Builder
@Configuration public class ChatClientConfig {
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultSystem("You are a helpful AI assistant.")
.defaultOptions(ChatOptionsBuilder.builder()
.withTemperature(0.7)
.withMaxTokens(1000)
.build())
.build();
}
}
Structured Output
public record BookRecommendation( String title, String author, String genre, String summary, int rating ) {}
@Service public class BookService {
private final ChatClient chatClient;
public BookRecommendation getRecommendation(String preferences) {
return chatClient.prompt()
.user("Recommend a book based on: " + preferences)
.call()
.entity(BookRecommendation.class);
}
public List<BookRecommendation> getRecommendations(String preferences, int count) {
return chatClient.prompt()
.user("Recommend " + count + " books based on: " + preferences)
.call()
.entity(new ParameterizedTypeReference<List<BookRecommendation>>() {});
}
}
Streaming
@Service public class StreamingChatService {
private final ChatClient chatClient;
public Flux<String> streamChat(String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}
// WebFlux controller
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamResponse(@RequestParam String message) {
return streamChat(message);
}
}
Function Calling
@Configuration public class FunctionConfig {
@Bean
@Description("Get current weather for a location")
public Function<WeatherRequest, WeatherResponse> currentWeather() {
return request -> weatherService.getWeather(request.location());
}
@Bean
@Description("Search for products by name")
public Function<ProductSearchRequest, List<Product>> searchProducts() {
return request -> productService.search(request.query(), request.maxResults());
}
}
public record WeatherRequest(String location) {} public record WeatherResponse(String location, double temperature, String conditions) {}
@Service public class AssistantService {
private final ChatClient chatClient;
public String assistWithFunctions(String message) {
return chatClient.prompt()
.user(message)
.functions("currentWeather", "searchProducts")
.call()
.content();
}
}
Embeddings
@Service @RequiredArgsConstructor public class EmbeddingService {
private final EmbeddingModel embeddingModel;
public float[] getEmbedding(String text) {
EmbeddingResponse response = embeddingModel.embedForResponse(List.of(text));
return response.getResult().getOutput();
}
public List<float[]> getEmbeddings(List<String> texts) {
EmbeddingResponse response = embeddingModel.embedForResponse(texts);
return response.getResults().stream()
.map(e -> e.getOutput())
.toList();
}
}
Vector Store (RAG)
Configuration
spring: ai: vectorstore: pgvector: dimensions: 1536 index-type: HNSW distance-type: COSINE_DISTANCE
RAG Query
@Service @RequiredArgsConstructor public class RagService {
private final VectorStore vectorStore;
private final ChatClient chatClient;
public String queryWithContext(String question) {
// Retrieve relevant documents
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.query(question)
.withTopK(5)
.withSimilarityThreshold(0.7)
);
// Build context
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
// Generate response with context
return chatClient.prompt()
.system("""
You are a helpful assistant. Answer questions based on the provided context.
If the answer is not in the context, say "I don't have information about that."
Context:
{context}
""")
.user(question)
.call()
.content();
}
}
QuestionAnswerAdvisor
@Configuration public class RagConfig {
@Bean
public ChatClient ragChatClient(ChatClient.Builder builder, VectorStore vectorStore) {
return builder
.defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
.build();
}
}
// Usage is simple - advisor handles RAG automatically @Service public class SimpleRagService {
private final ChatClient ragChatClient;
public String answer(String question) {
return ragChatClient.prompt()
.user(question)
.call()
.content();
}
}
Best Practices
Do Don't
Use structured output for predictable results Parse free-form text manually
Implement proper error handling Ignore API failures
Use streaming for long responses Block on large generations
Cache embeddings when possible Regenerate embeddings repeatedly
Set appropriate token limits Use unlimited tokens
Production Checklist
-
API keys secured (environment variables)
-
Rate limiting implemented
-
Error handling and retries
-
Token usage monitoring
-
Response caching where appropriate
-
Vector store properly indexed
-
Embedding dimension consistency
-
Prompt injection protection
-
Cost monitoring and alerts
-
Fallback models configured
When NOT to Use This Skill
-
Raw OpenAI/Anthropic API - Use respective SDKs directly
-
ML model training - Use Python frameworks (PyTorch, TensorFlow)
-
Non-Spring applications - Use LangChain or native SDKs
-
Simple text generation - May be overkill for trivial use cases
Anti-Patterns
Anti-Pattern Problem Solution
Hardcoded API keys Security risk Use environment variables
No token limit Cost explosion Set max-tokens appropriately
Synchronous for long requests Thread blocking Use streaming
Ignoring rate limits API errors, bans Implement retry with backoff
No caching for embeddings High costs Cache embeddings locally
Prompt injection vulnerability Security risk Sanitize user input
Quick Troubleshooting
Problem Diagnostic Fix
API key invalid Check error message Verify OPENAI_API_KEY env var
Rate limit exceeded 429 error Add retry logic, reduce requests
Timeout on large prompts Connection timeout Use streaming, increase timeout
Embeddings dimension mismatch Vector store error Match embedding model dimensions
Structured output fails JSON parse error Simplify schema, add examples
Reference Documentation
- Spring AI Reference