Building a Smarter Way to Explore Next.js Docs: My Weekend with Semantic Search + RAG

Introduction This past weekend, fueled by curiosity and the "learn by doing" mantra, I embarked on a mission: to build a more intelligent way to navigate the vast landscape of the Next.js documentation. The result? An AI-powered semantic search and Retrieval-Augmented Generation (RAG) tool specifically designed for the official Next.js docs. The Problem with Simple Keyword Search We've all experienced the frustration of sifting through numerous search results, many only tangentially related to our actual query. Traditional keyword-based search often struggles with understanding the nuances of language and the context of our questions. This is particularly true for complex documentation like that of a powerful framework like Next.js. My Solution: Semantic Search + RAG To tackle this, I decided to leverage the power of semantic search and Retrieval-Augmented Generation (RAG). Here's the core idea: Semantic Search: Instead of just matching keywords, the system understands the meaning behind your question. This allows for more relevant results, even if your query doesn't use the exact terminology present in the documentation. Retrieval-Augmented Generation (RAG): Once relevant documentation snippets are found, a language model uses this context to generate a more informed and accurate answer to your question, complete with citations. Diving into the Implementation Here's a peek under the hood at the technologies and processes involved: 1. Populating the Knowledge Base: The first crucial step was to ingest and process the Next.js documentation. This involved: Fetching Navigation Links: I started by scraping the main Next.js documentation page to extract all the links to individual documentation pages. Crawling and Extracting Content: For each of these links, I used CheerioWebBaseLoader to fetch the content of the page. Chunking the Text: Large documents were split into smaller, manageable chunks using RecursiveCharacterTextSplitter. This is important for efficient embedding and retrieval. Generating Embeddings: The heart of semantic search! I used GoogleGenerativeAIEmbeddings to create vector embeddings for each chunk of the documentation. These embeddings capture the semantic meaning of the text. Storing in a Vector Database: The generated embeddings were stored in Chroma, an efficient vector store that allows for fast similarity searches. import { Chroma } from "@langchain/community/vectorstores/chroma"; import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters"; import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai"; import { load } from "cheerio"; import "dotenv/config"; import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio"; import { Document } from "@langchain/core/documents"; const embeddings = new GoogleGenerativeAIEmbeddings({ model: "models/text-embedding-004", apiKey: process.env.GOOGLE_API_KEY, }); const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200, }); const vectorStore = new Chroma(embeddings, { collectionName: "next-js-docs", }); export async function PopulateDatabase() { const navs = await GetNavLinks(); if (!Array.isArray(navs) || navs.length < 0) { throw new Error("Failed to populate database"); } // const testNavs = navs.slice(0, 3); const testNavs = navs.slice(1, 375) for (let index = 0; index < testNavs.length; index++) { const i = testNavs[index]; try { console.info( `

Apr 28, 2025 - 10:19

Building a Smarter Way to Explore Next.js Docs: My Weekend with Semantic Search + RAG

Introduction

This past weekend, fueled by curiosity and the "learn by doing" mantra, I embarked on a mission: to build a more intelligent way to navigate the vast landscape of the Next.js documentation. The result? An AI-powered semantic search and Retrieval-Augmented Generation (RAG) tool specifically designed for the official Next.js docs.

The Problem with Simple Keyword Search

We've all experienced the frustration of sifting through numerous search results, many only tangentially related to our actual query. Traditional keyword-based search often struggles with understanding the nuances of language and the context of our questions. This is particularly true for complex documentation like that of a powerful framework like Next.js.

My Solution: Semantic Search + RAG

To tackle this, I decided to leverage the power of semantic search and Retrieval-Augmented Generation (RAG). Here's the core idea:

Semantic Search: Instead of just matching keywords, the system understands the meaning behind your question. This allows for more relevant results, even if your query doesn't use the exact terminology present in the documentation.

Retrieval-Augmented Generation (RAG): Once relevant documentation snippets are found, a language model uses this context to generate a more informed and accurate answer to your question, complete with citations.

Diving into the Implementation

Here's a peek under the hood at the technologies and processes involved:

1. Populating the Knowledge Base:

The first crucial step was to ingest and process the Next.js documentation. This involved:

Fetching Navigation Links: I started by scraping the main Next.js documentation page to extract all the links to individual documentation pages.
Crawling and Extracting Content: For each of these links, I used CheerioWebBaseLoader to fetch the content of the page.
Chunking the Text: Large documents were split into smaller, manageable chunks using RecursiveCharacterTextSplitter. This is important for efficient embedding and retrieval.
Generating Embeddings: The heart of semantic search! I used GoogleGenerativeAIEmbeddings to create vector embeddings for each chunk of the documentation. These embeddings capture the semantic meaning of the text.
Storing in a Vector Database: The generated embeddings were stored in Chroma, an efficient vector store that allows for fast similarity searches.

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";
import { load } from "cheerio";
import "dotenv/config";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { Document } from "@langchain/core/documents";

const embeddings = new GoogleGenerativeAIEmbeddings({
  model: "models/text-embedding-004",
  apiKey: process.env.GOOGLE_API_KEY,
});

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const vectorStore = new Chroma(embeddings, {
  collectionName: "next-js-docs",
});

export async function PopulateDatabase() {
  const navs = await GetNavLinks();
  if (!Array.isArray(navs) || navs.length < 0) {
    throw new Error("Failed to populate database");
  }

  // const testNavs = navs.slice(0, 3);
  const testNavs = navs.slice(1, 375)


  for (let index = 0; index < testNavs.length; index++) {
    const i = testNavs[index];
    try {
      console.info(
        `


                                            
                            
                                Read More                                
                            
                        
                                        
                        Tags:
                        
                                                    
                    
                    
                        
                            
                                                                    
                                        
                                            
                                            Previous Article                                        
                                    
                                    
                                        Chatbot TNC - SaaS Webflow Website Template
                                    
                                                            
                            
                                                                    
                                        
                                            Next Article                                            
                                        
                                    
                                    
                                        Have you ever heard about proxy injection? Discover how to build an automatic pr...
                                    
                                                            
                        
                    
                                        
                        
                            
                                
                                    
                                        Related Posts
                                    
                                
                                
                                    
                                                                                            
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        Agentic Dementia
                                                                Apr 22, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        All Data and AI Weekly #186 — April 21, 2025
                                                                Apr 21, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        How to Become a SafeLine Reseller
                                                                Apr 27, 2025
     0

                                                        
                                                    
                                                                                    
                                
                            
                        
                    
                                            
                            
                                
                                    
                                                                                    
                                                                            
                                    
                                                                                    
                                                    
        
        
        
            
                
                    Name
                    
                
                
                    Email
                    
                
            
        
        
            Comment