Retrieval‑Augmented Generation (RAG) with Spring AI 2

Customizing AI to answer questions about your company or documents doesn’t have to mean training a costly Large Language Model from scratch. Instead, Retrieval‑Augmented Generation (RAG) offers a practical way forward: it stores your documents in a knowledge base and retrieves only the most relevant ones to enrich the AI’s responses. This keeps answers accurate, context‑aware, and cost‑efficient. In this tutorial I will give a short introduction on how to implement RAG with Spring Boot 4, Spring AI 2 and Neo4j.

If you want to customize an AI, you generally have several options:

Train your own Large Language Model (LLM).
This approach gives you full control over the model’s knowledge and behavior, but it is by far the most expensive and resource‑intensive solution. Training requires massive datasets, specialized hardware (such as GPU clusters), and ongoing maintenance. For most organizations, this effort is simply not cost‑effective.

Provide all relevant documents directly in the prompt.
This method is more straightforward and avoids the complexity of training. However, prompts have strict size limits, and including large amounts of text quickly becomes inefficient and expensive. To make this viable, you would need to carefully select only the most relevant documents for each query, which adds complexity.

Use Retrieval‑Augmented Generation (RAG).
RAG offers a more scalable and elegant solution. Instead of feeding all documents into the prompt, RAG creates an external knowledge base (often stored in a vector database). When a user asks a question, the system retrieves only the most relevant documents based on semantic similarity and injects them into the prompt. This ensures that the AI has the necessary context to answer accurately, while keeping token usage and costs under control.

In this tutorial, we will use Neo4j as our vector database and OpenAI as the LLM provider. This means you need an OpenAI api-key. Unlike many other vendors, Neo4j can be installed locally, which makes it a flexible choice for developers who want to experiment without relying solely on cloud services. Of course, other vector databases could be used as well—Spring AI supports a wide range of providers, giving you the freedom to adapt the setup to your environment.

Neo4j can easily be installed with a single Docker command:

docker run --restart always --publish=7474:7474 --publish=7687:7687 neo4j

The database can be accessed by http://localhost:7474/. During your first access you should set a new password. The default username and password are both neo4j.

To get started, we first need to import all the relevant dependencies. This is done in the pom.xml file, which defines the project’s build configuration and ensures that Spring AI, Neo4j, and OpenAI integrations are available.

       <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-neo4j</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-webmvc</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
            <version>2.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-vector-store-neo4j</artifactId>
            <version>2.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-advisors-vector-store</artifactId>
            <version>2.0.0-M1</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-neo4j-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-webmvc-test</artifactId>
            <scope>test</scope>
        </dependency>

As our foundation, we use spring-ai-starter-model-openai, which integrates OpenAI as the LLM provider. Alongside this, we include spring-ai-advisors-vector-store, a module we’ll discuss in more detail later, that enables vector‑store‑based retrieval.

To expose a REST interface for interacting with the application, we add spring-boot-starter-webmvc. For persistence and vector storage, we rely on spring-boot-starter-neo4j together with spring-ai-starter-vector-store-neo4j, which allow us to connect Neo4j seamlessly with Spring AI.

The beauty of Spring Boot starters is that they automatically configure the necessary beans by inspecting the properties you define. This means we can simply add the required configuration values to our application.properties file, and Spring Boot will wire everything up for us. Please fill in the password and the api-key before you continue.

spring.application.name=spring-ai-rag

spring.neo4j.uri=bolt://localhost:7687
spring.neo4j.authentication.username=neo4j
spring.neo4j.authentication.password=<NEO4J-PASSWORD>

spring.ai.vectorstore.type=neo4j
spring.ai.vectorstore.neo4j.initialize-schema=true
spring.ai.vectorstore.neo4j.database-name=neo4j
spring.ai.vectorstore.neo4j.embedding-property=embedding
spring.ai.vectorstore.neo4j.index-name=custom-index
spring.ai.vectorstore.neo4j.distance-type: cosine
spring.ai.vectorstore.neo4j.embedding-dimension: 1536

spring.ai.openai.api-key=<OPENAI-KEY>

Next, we will create the AIService. This class is responsible to interact with the vector database and the OpenAI client.

@Service
public class AiService {

    final VectorStore vectorStore;
    final ChatClient chatClient;

    public AiService(VectorStore vectorStore, ChatClient.Builder chatClientBuilder) {
        this.vectorStore = vectorStore;
        this.chatClient = chatClientBuilder.build();
    }

    public void store(RagDocument rag) {
        var doc = Document.builder()
                .id(rag.id())
                .text(rag.text())
                .build();
        vectorStore.add(List.of(doc));
    }

    public String ask(String question) {
        return this.chatClient
                .prompt()
                .advisors(QuestionAnswerAdvisor.builder(vectorStore).build())
                .user(question)
                .call()
                .content();
    }

    public record RagDocument(String text, String id) {
    }
}

The store method takes a RagDocument, a simple class containing a text and an ID, and saves it into the vector database. At first glance this looks trivial, but under the hood several important steps occur.

Instead of storing plain text, the system generates an embedding for each document. An embedding is a high‑dimensional vector representation of the text in a semantic space. This allows the database to capture not just the literal words, but their contextual meaning.

To illustrate: in a traditional relational database, searching for “rocket” would not return documents about “space ships,” because the keywords don’t match exactly. A vector database, however, stores embeddings that encode semantic similarity. Words like rocket, space ship, and shuttle are mapped to nearby points in the vector space, so queries can retrieve conceptually related documents even when the exact wording differs.

Creating these embeddings is a complex task. Determining the “meaning” of words and phrases requires deep language understanding. This is where the LLM comes in. Behind the scenes, whenever you store a document, the LLM is first called to generate its embedding. That embedding is then stored in the vector database, ready to be retrieved later when a query needs context.

This process is what makes Retrieval‑Augmented Generation (RAG) powerful: it bridges the gap between raw text and semantic understanding, enabling your AI to answer questions with contextually relevant information rather than relying on keyword matches alone.

The second important method is ask. Here, you provide a simple string containing a question, and the chatClient generates an answer.

A key element in this process is the QuestionAnswerAdvisor, which is initialized with the vectorStore as a parameter. This advisor ensures that all relevant documents are retrieved from the database and added to the question before it is passed to the LLM.

Behind the scenes, the following steps occur:

Embedding Creation: The question string is converted into an embedding—a high‑dimensional vector representation of its meaning.
Semantic Search: The vector database is queried for embeddings that are conceptually close to the question. This allows the system to find documents that are semantically related, even if they don’t share exact keywords.
Context Injection: The retrieved documents are appended to the original question, enriching the prompt with context
Answer Generation: The LLM uses this augmented prompt to generate a precise, context‑aware answer.

To test everything we have a Controller that will enable us to interact with the service.

@Controller()
@RequestMapping("/api/rag")
public class AiController {

    final AiService service;

    public AiController(AiService service) {
        this.service = service;
    }

    @PostMapping(value = "/store", consumes = "application/json", produces = "text/plain")
    public ResponseEntity<Void> store(@RequestBody AiService.RagDocument rag) {
        service.store(rag);
        return ResponseEntity.noContent().build();
    }

    @GetMapping(value = "/ask", consumes = "text/plain", produces = "text/plain")
    @ResponseBody
    public String ask(@RequestParam String question) {
        return service.ask(question);
    }
}

The AiController class ties everything together. Its purpose is straightforward: it exposes REST endpoints that allow you to interact with the application. By looking at the code, the responsibilities should be self‑explanatory.

We can test our Application with two simple curl requests.

$ curl -X POST http://localhost:8080/api/rag/store   -H "Content-Type: application/json"   -d '{
    "id": "123",
    "text": "We use spring boot for our applications. We use Spring AI to create AI applications."
  }'

$ curl -X GET http://localhost:8080/api/rag/ask?question=What%20stack%20to%20use%3F  -H "Content-Type: text/plain" -H "Accept: text/plain"
Based on the provided context: use Spring Boot for your applications and Spring AI for building the AI parts.

First we inserted „We use spring boot for our applications. We use Spring AI to create AI applications.“ into the vector database. We than asked „What stack to use?“ and got Spring Boot and AI as answer although the keyword stack was never used in the description.

The working example project can be found here.