Implementing RAG with Spring AI and Pinecone: A Practical Guide

Introduction Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for building AI applications that combine information retrieval with generative language models. This guide demonstrates how to implement a RAG system using Spring AI with Pinecone as the vector database, specifically for creating a documentation chatbot. What is RAG? RAG combines two key components: Retrieval: Finds relevant information from a knowledge base using semantic search Generation: Uses a language model to generate contextual responses based on retrieved information System Architecture [Documentation Website] → [Scraper] → [Chunking] → [Pinecone Vector DB] ↑ [User Query] → [Spring AI] → [Semantic Search] → [LLM Generation] → [Response] Prerequisites Pinecone account (free tier available) Spring Boot application (3.x recommended) Basic understanding of vector databases Implementation Steps 1. Setting Up Pinecone Integration Gradle Dependency implementation "org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter" Configuration (application.yml) spring: ai: vectorstore: pinecone: apiKey: ${PINECONE_API_KEY} environment: ${PINECONE_ENV} index-name: ${PINECONE_INDEX} project-id: ${PINECONE_PROJECT_ID} 2. Document Processing Pipeline Web Scraper Implementation public class DocumentationScraper { private final Set visitedUrls = new HashSet(); private final String baseDomain; public DocumentationScraper(String baseUrl) { this.baseDomain = extractDomain(baseUrl); } public List scrape(String startUrl) { List documents = new ArrayList(); scrapeRecursive(startUrl, documents); return documents; } // Includes URL normalization, same-domain checking, and content extraction // ... (full implementation as in original) } Document Chunking Service @Service public class DocumentationService { private final VectorStore vectorStore; private final TokenTextSplitter textSplitter; public DocumentationService(VectorStore vectorStore) { this.vectorStore = vectorStore; this.textSplitter = new TokenTextSplitter( 2000, // Optimal chunk size for technical documentation 300, // Minimum chunk size 100, // Overlap for context preservation 15, // Max chunks per page true // Preserve document structure ); } public List processDocument(String content, Map metadata) { Document originalDoc = new Document(content, metadata); List chunks = textSplitter.split(originalDoc); // Enhance metadata for better retrieval for (int i = 0; i

Apr 27, 2025 - 15:53

Implementing RAG with Spring AI and Pinecone: A Practical Guide

Introduction

Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for building AI applications that combine information retrieval with generative language models. This guide demonstrates how to implement a RAG system using Spring AI with Pinecone as the vector database, specifically for creating a documentation chatbot.

What is RAG?

RAG combines two key components:

Retrieval: Finds relevant information from a knowledge base using semantic search
Generation: Uses a language model to generate contextual responses based on retrieved information

System Architecture

[Documentation Website] → [Scraper] → [Chunking] → [Pinecone Vector DB]
                                     ↑
                                     [User Query] → [Spring AI] → [Semantic Search] → [LLM Generation] → [Response]

Prerequisites

Pinecone account (free tier available)
Spring Boot application (3.x recommended)
Basic understanding of vector databases

Implementation Steps

1. Setting Up Pinecone Integration

Gradle Dependency

implementation "org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter"

Configuration (application.yml)

spring:
  ai:
    vectorstore:
      pinecone:
        apiKey: ${PINECONE_API_KEY}
        environment: ${PINECONE_ENV}
        index-name: ${PINECONE_INDEX}
        project-id: ${PINECONE_PROJECT_ID}

2. Document Processing Pipeline

Web Scraper Implementation

public class DocumentationScraper {
    private final Set<String> visitedUrls = new HashSet<>();
    private final String baseDomain;

    public DocumentationScraper(String baseUrl) {
        this.baseDomain = extractDomain(baseUrl);
    }

    public List<Document> scrape(String startUrl) {
        List<Document> documents = new ArrayList<>();
        scrapeRecursive(startUrl, documents);
        return documents;
    }

    // Includes URL normalization, same-domain checking, and content extraction
    // ... (full implementation as in original)
    }

Document Chunking Service

@Service
public class DocumentationService {
    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentationService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter(
            2000,  // Optimal chunk size for technical documentation
            300,   // Minimum chunk size
            100,   // Overlap for context preservation
            15,    // Max chunks per page
            true   // Preserve document structure
        );
    }

    public List<Document> processDocument(String content, Map<String, Object> metadata) {
        Document originalDoc = new Document(content, metadata);
        List<Document> chunks = textSplitter.split(originalDoc);

        // Enhance metadata for better retrieval
        for (int i = 0; i < chunks.size(); i++) {
            chunks.get(i).getMetadata()
                .put("chunk_number", i)
                .put("total_chunks", chunks.size());
        }
        return chunks;
    }
}

3. Knowledge Base Initialization

REST Endpoint for Loading Data

@RestController
@RequestMapping("/document")
@Tag(name = "AI Module API")
public class DocumentController {

    private final DocumentationService documentationService;

    @PostMapping("/load-data")
    public ResponseEntity<String> loadDocumentation() {
        documentationService.scrapeAndStoreDocumentation("https://docs.openwes.top");
        return ResponseEntity.ok("Documentation loaded successfully");
    }
}

4. Implementing RAG in Chat Completions

@Service
public class ChatService {

    private final ChatModel chatModel;
    private final VectorStore vectorStore;

    public String generateResponse(String query) {
        SearchRequest searchRequest = SearchRequest.defaults()
            .withTopK(5)  // Retrieve top 5 relevant chunks
            .withSimilarityThreshold(0.7);

        return ChatClient.create(chatModel)
            .prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore, searchRequest))
            .call()
            .content();
    }
    }

Best Practices

Optimal Chunking:
Technical content: 1500-2500 tokens
Narrative content: 500-1000 tokens
Include overlap (100-200 tokens) for context preservation
Enhanced Metadata:

   metadata.put("document_type", "API Reference");
   metadata.put("last_updated", "2024-03-01");
   metadata.put("relevance_score", 0.95);

Hybrid Search:

   SearchRequest hybridRequest = SearchRequest.defaults()
       .withTopK(5)
       .withHybridSearch(true)
       .withKeywordWeight(0.3);

Prompt Engineering:

   PromptTemplate template = new PromptTemplate("""
       Answer the question based on the following context:
       {context}

       Question: {question}

       If you don't know the answer, say "I don't know".
       """);

Performance Optimization

Caching: Implement Redis caching for frequent queries
Async Processing: Use @Async for document ingestion
Batch Processing: Process documents in batches of 50-100

Evaluation Metrics

Metric	Target	Measurement Method
Retrieval Precision	>85%	Human evaluation
Response Latency	<2s	Performance testing
User Satisfaction	>4/5	Feedback surveys

Conclusion

This implementation demonstrates how to build a production-ready RAG system using Spring AI and Pinecone. Key advantages include:

Accurate, context-aware responses for documentation queries
Scalable vector search capabilities
Easy integration with existing Spring applications

Next Steps

Implement user feedback mechanism:

   @PostMapping("/feedback")
   public void logFeedback(@RequestBody FeedbackDTO feedback) {
       // Store feedback for continuous improvement
   }

Add analytics dashboard for query patterns
Implement automatic periodic document updates

Project Reference: The complete implementation is available on GitHub in the module-ai package. Contributions and feedback are welcome!