Automating SQL Queries with LangChain and Gemini : A Step-by-Step Guide
GitHub Repository Learn how to automate SQL query generation and execution using LangChain, Google Gemini AI, and MySQL. This guide walks you through setting up a seamless pipeline from transforming natural language questions into SQL queries to executing them and generating insightful answers. Environment Setup Creating a Virtual Environment Isolate your project dependencies by creating a virtual environment: py -m venv .venv activate virtual environment .\.venv\Scripts\activate Installing Required Packages Install the necessary Python packages: pip install mysql-connector-python langchain langchain-community langchain-google-genai python-dotenv Configuring Environment Variables Create a .env file with your configuration: LANGSMITH_TRACING=true LANGSMITH_ENDPOINT="https://api.smith.langchain.com" LANGSMITH_API_KEY=your_langsmith_key LANGSMITH_PROJECT="company-sql" GOOGLE_API_KEY=your_google_api_key SQL_HOST=your_db_host SQL_USER=your_db_username SQL_PASSWORD=your_db_password SQL_DB_NAME=your_database_name Building the SQL Automation Pipeline Importing Required Modules from langchain_google_genai import ChatGoogleGenerativeAI from langchain_community.utilities import SQLDatabase from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool from langchain import hub from dotenv import load_dotenv import os ChatGoogleGenerativeAI : To use Google's Gemini model. SQLDatabase : Manages database connections using SQL URI. QuerySQLDatabaseTool : Executes queries and retrieves results. hub : Accesses pre-defined prompts from LangChain Hub. load_dotenv : Manages environment variables. os : Accesses operating system functionality. Loading Configuration load_dotenv(override=True) SQL_HOST=os.getenv("SQL_HOST") SQL_USER=os.getenv("SQL_USER") SQL_PASSWORD=os.getenv("SQL_PASSWORD") SQL_DB_NAME=os.getenv("SQL_DB_NAME") This code loads sensitive configuration from your .env file, keeping credentials secure and out of your source code. Initializing Gemini AI Model llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash") We're using Google's gemini-1.5-flash model, which offers excellent performance for SQL generation tasks while being cost-effective. Establishing Database Connection connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}" db = SQLDatabase.from_uri(connection_Uri) This creates a connection string and establishes a connection to your MySQL database using the credentials from your environment variables. Core Functionality Implementation SQL Query Generation query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt") def write_query(question: str): """Generate SQL query from the user's question.""" prompt = query_prompt_template.invoke( { "dialect": db.dialect, "top_k": 10, "table_info": db.get_table_info(), "input": question, } ) response = llm.invoke(prompt.to_string()) extraction_prompt = """ Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting: {response} SQL Query: """ # Format the prompt with the actual response prompt = extraction_prompt.format(response=response) # Invoke the language model with the prompt parsed_query = llm.invoke(prompt) return parsed_query.content Pulls a specialized SQL generation prompt from LangChain Hub Formats the prompt with database schema information Sends the prompt to Gemini AI for query generation Uses a second prompt to extract just the SQL from the response Query Execution def execute_query(query: str): """Execute the SQL query.""" execute_query_tool = QuerySQLDatabaseTool(db=db) return execute_query_tool.invoke(query) This function creates a query execution tool and runs the generated SQL against your database, returning the raw results. Natural Language Answer Generation def generate_answer(question: str, query: str, result: str): """Generate an answer using the query results.""" prompt = ( "Given the following user question, corresponding SQL query, " "and SQL result, answer the user question.\n\n" f'Question: {question}\n' f'SQL Query: {query}\n' f'SQL Result: {result}' ) response = llm.invoke(prompt) return response.content This function takes the original question, generated SQL, and query results, then asks Gemini to formulate a human-friendly answer. Putting It All Together question = "Which employee is leading Project Gamma" query = write_query(question) result = execute_query(query) answer = generate_answer(question,query, result ) print(answer)

Learn how to automate SQL query generation and execution using LangChain, Google Gemini AI, and MySQL. This guide walks you through setting up a seamless pipeline from transforming natural language questions into SQL queries to executing them and generating insightful answers.
Environment Setup
Creating a Virtual Environment
Isolate your project dependencies by creating a virtual environment:
py -m venv .venv
activate virtual environment
.\.venv\Scripts\activate
Installing Required Packages
Install the necessary Python packages:
pip install mysql-connector-python langchain langchain-community langchain-google-genai python-dotenv
Configuring Environment Variables
Create a .env
file with your configuration:
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT="company-sql"
GOOGLE_API_KEY=your_google_api_key
SQL_HOST=your_db_host
SQL_USER=your_db_username
SQL_PASSWORD=your_db_password
SQL_DB_NAME=your_database_name
Building the SQL Automation Pipeline
Importing Required Modules
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os
-
ChatGoogleGenerativeAI
: To use Google's Gemini model. -
SQLDatabase
: Manages database connections using SQL URI. -
QuerySQLDatabaseTool
: Executes queries and retrieves results. -
hub
: Accesses pre-defined prompts from LangChain Hub. -
load_dotenv
: Manages environment variables. -
os
: Accesses operating system functionality.
Loading Configuration
load_dotenv(override=True)
SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")
This code loads sensitive configuration from your .env
file, keeping credentials secure and out of your source code.
Initializing Gemini AI Model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
We're using Google's gemini-1.5-flash
model, which offers excellent performance for SQL generation tasks while being cost-effective.
Establishing Database Connection
connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)
This creates a connection string and establishes a connection to your MySQL database using the credentials from your environment variables.
Core Functionality Implementation
SQL Query Generation
query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
def write_query(question: str):
"""Generate SQL query from the user's question."""
prompt = query_prompt_template.invoke(
{
"dialect": db.dialect,
"top_k": 10,
"table_info": db.get_table_info(),
"input": question,
}
)
response = llm.invoke(prompt.to_string())
extraction_prompt = """
Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:
{response}
SQL Query:
"""
# Format the prompt with the actual response
prompt = extraction_prompt.format(response=response)
# Invoke the language model with the prompt
parsed_query = llm.invoke(prompt)
return parsed_query.content
- Pulls a specialized SQL generation prompt from LangChain Hub
- Formats the prompt with database schema information
- Sends the prompt to Gemini AI for query generation
- Uses a second prompt to extract just the SQL from the response
Query Execution
def execute_query(query: str):
"""Execute the SQL query."""
execute_query_tool = QuerySQLDatabaseTool(db=db)
return execute_query_tool.invoke(query)
This function creates a query execution tool and runs the generated SQL against your database, returning the raw results.
Natural Language Answer Generation
def generate_answer(question: str, query: str, result: str):
"""Generate an answer using the query results."""
prompt = (
"Given the following user question, corresponding SQL query, "
"and SQL result, answer the user question.\n\n"
f'Question: {question}\n'
f'SQL Query: {query}\n'
f'SQL Result: {result}'
)
response = llm.invoke(prompt)
return response.content
This function takes the original question, generated SQL, and query results, then asks Gemini to formulate a human-friendly answer.
Putting It All Together
question = "Which employee is leading Project Gamma"
query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)
Output
Charlie Brown is the Marketing Lead for Project Gamma.
Conclusion
This implementation demonstrates how to create a powerful natural language interface for your SQL databases. By combining LangChain's orchestration capabilities with Gemini's advanced language understanding, you can build systems that make data accessible to non-technical users while maintaining the precision of SQL queries.
Complete Code
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv(override=True)
SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")
# Initialize the Gemini model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
# Connect to the MySQL database
connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)
# Pull the SQL query prompt from LangChain Hub
query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
def write_query(question: str):
"""Generate SQL query from the user's question."""
prompt = query_prompt_template.invoke(
{
"dialect": db.dialect,
"top_k": 10,
"table_info": db.get_table_info(),
"input": question,
}
)
response = llm.invoke(prompt.to_string())
extraction_prompt = """
Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:
{response}
SQL Query:
"""
# Format the prompt with the actual response
prompt = extraction_prompt.format(response=response)
# Invoke the language model with the prompt
parsed_query = llm.invoke(prompt)
return parsed_query.content
def execute_query(query: str):
"""Execute the SQL query."""
execute_query_tool = QuerySQLDatabaseTool(db=db)
return execute_query_tool.invoke(query)
def generate_answer(question: str, query: str, result: str):
"""Generate an answer using the query results."""
prompt = (
"Given the following user question, corresponding SQL query, "
"and SQL result, answer the user question.\n\n"
f'Question: {question}\n'
f'SQL Query: {query}\n'
f'SQL Result: {result}'
)
response = llm.invoke(prompt)
return response.content
question = "Which employee is leading Project Gamma"
query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)