Build an AI Agent That Understands SQL and PDFs Using MindsDB
In this tutorial, you'll learn how to build a production-ready AI agent that can answer questions using both structured data from a PostgreSQL database and unstructured content from PDF files. We'll use MindsDB, an open-source platform designed to integrate LLM-powered agents with databases and external knowledge sources like documents. Whether you're developing internal tools, research assistants, or customer support bots, this step-by-step guide will show you how to combine retrieval-augmented generation (RAG), SQL, and language models efficiently. Tools Used MindsDB – agent orchestration and SQL-based pipeline PostgreSQL – structured data source PDF documents – unstructured data OpenAI GPT / Hugging Face Transformers – response generation Objective We’ll build an AI agent that can: Answer natural language questions by converting them into SQL queries Retrieve and summarize content from uploaded PDFs Integrate with platforms such as Slack or web interfaces Step 1: Install MindsDB You can run MindsDB locally using Docker or use the cloud-hosted version. Run with Docker: docker run -p 47334:47334 mindsdb/mindsdb This provides access to the built-in SQL Editor, REST APIs, and external connectors. Step 2: Create a Conversational Model We start by creating a language model capable of natural dialogue, powered by OpenAI or Hugging Face via LangChain. CREATE MODEL conversational_model PREDICT answer USING engine = 'langchain', openai_api_key = 'YOUR_OPENAI_API_KEY', model_name = 'gpt-4', mode = 'conversational', user_column = 'question', assistant_column = 'answer', prompt_template = 'Answer the user input in a helpful way', max_tokens = 100, temperature = 0, verbose = true; To verify the model is active: DESCRIBE conversational_model; Step 3: Add Skills to Your AI Agent 3.1: Text-to-SQL Skill for Structured Data Connect MindsDB to your PostgreSQL database: CREATE DATABASE datasource WITH ENGINE = "postgres", PARAMETERS = { "user": "demo_user", "password": "demo_password", "host": "samples.mindsdb.com", "port": "5432", "database": "demo", "schema": "demo_data" }; Define a skill for querying this data: CREATE SKILL text2sql_skill USING type = 'text2sql', database = 'datasource', tables = ['house_sales'], description = 'Contains US house sales data from 2007 to 2015'; 3.2: Knowledge Base Skill for PDFs Step 1: Create an embedding model: CREATE MODEL embedding_model_hf PREDICT embedding USING engine = 'langchain_embedding', class = 'HuggingFaceEmbeddings', input_columns = ["content"]; Step 2: Create a knowledge base: CREATE KNOWLEDGE BASE my_knowledge_base USING model = embedding_model_hf; Step 3: Insert PDF content: INSERT INTO my_knowledge_base SELECT * FROM files.my_file_name; Step 4: Create a skill linked to the knowledge base: CREATE SKILL kb_skill USING type = 'knowledge_base', source = 'my_knowledge_base', description = 'PDF report with analysis on house pricing trends'; Step 4: Create the AI Agent Now, connect the conversational model with the two skills. CREATE AGENT ai_agent USING model = 'conversational_model', skills = ['text2sql_skill', 'kb_skill']; Test the agent: SELECT question, answer FROM ai_agent WHERE question = 'How many houses were sold in 2015?'; Optional: Deploy the Agent to Slack CREATE DATABASE mindsdb_slack WITH ENGINE = 'slack', PARAMETERS = { "token": "xoxb-xxx", "app_token": "xapp-xxx" }; CREATE CHATBOT ai_chatbot USING database = 'mindsdb_slack', agent = 'ai_agent'; Your AI agent is now live on Slack and can be interacted with directly by your team. Automating Knowledge Base Updates To refresh your knowledge base regularly: CREATE JOB update_kb_hourly AS ( INSERT INTO my_knowledge_base ( SELECT * FROM data_source WHERE id > LAST ) ) EVERY hour; Example Questions You Can Ask "How many houses were sold in California in 2012?" "Summarize the key insights from the PDF report." "What is the average price of houses sold per state in 2015?" Use Cases Internal AI assistants for querying data systems Customer support bots with product manuals in PDF format AI dashboards for real-time analytics and summaries Slack bots that act as knowledge agents for your team SEO Keywords (Technical) Build AI agent with MindsDB Natural language to SQL chatbot LangChain agent integration RAG PDF chatbot with SQL PostgreSQL GPT chatbot Hugging Face Embeddings MindsDB MindsDB Slack integration Enterprise LLM agent orchestration Conclusion By combining MindsDB with OpenAI or Hugging Face models, you can rapidly develop intelligent agents

In this tutorial, you'll learn how to build a production-ready AI agent that can answer questions using both structured data from a PostgreSQL database and unstructured content from PDF files. We'll use MindsDB, an open-source platform designed to integrate LLM-powered agents with databases and external knowledge sources like documents.
Whether you're developing internal tools, research assistants, or customer support bots, this step-by-step guide will show you how to combine retrieval-augmented generation (RAG), SQL, and language models efficiently.
Tools Used
- MindsDB – agent orchestration and SQL-based pipeline
- PostgreSQL – structured data source
- PDF documents – unstructured data
- OpenAI GPT / Hugging Face Transformers – response generation
Objective
We’ll build an AI agent that can:
- Answer natural language questions by converting them into SQL queries
- Retrieve and summarize content from uploaded PDFs
- Integrate with platforms such as Slack or web interfaces
Step 1: Install MindsDB
You can run MindsDB locally using Docker or use the cloud-hosted version.
Run with Docker:
docker run -p 47334:47334 mindsdb/mindsdb
This provides access to the built-in SQL Editor, REST APIs, and external connectors.
Step 2: Create a Conversational Model
We start by creating a language model capable of natural dialogue, powered by OpenAI or Hugging Face via LangChain.
CREATE MODEL conversational_model
PREDICT answer
USING
engine = 'langchain',
openai_api_key = 'YOUR_OPENAI_API_KEY',
model_name = 'gpt-4',
mode = 'conversational',
user_column = 'question',
assistant_column = 'answer',
prompt_template = 'Answer the user input in a helpful way',
max_tokens = 100,
temperature = 0,
verbose = true;
To verify the model is active:
DESCRIBE conversational_model;
Step 3: Add Skills to Your AI Agent
3.1: Text-to-SQL Skill for Structured Data
Connect MindsDB to your PostgreSQL database:
CREATE DATABASE datasource
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo",
"schema": "demo_data"
};
Define a skill for querying this data:
CREATE SKILL text2sql_skill
USING
type = 'text2sql',
database = 'datasource',
tables = ['house_sales'],
description = 'Contains US house sales data from 2007 to 2015';
3.2: Knowledge Base Skill for PDFs
Step 1: Create an embedding model:
CREATE MODEL embedding_model_hf
PREDICT embedding
USING
engine = 'langchain_embedding',
class = 'HuggingFaceEmbeddings',
input_columns = ["content"];
Step 2: Create a knowledge base:
CREATE KNOWLEDGE BASE my_knowledge_base
USING
model = embedding_model_hf;
Step 3: Insert PDF content:
INSERT INTO my_knowledge_base
SELECT * FROM files.my_file_name;
Step 4: Create a skill linked to the knowledge base:
CREATE SKILL kb_skill
USING
type = 'knowledge_base',
source = 'my_knowledge_base',
description = 'PDF report with analysis on house pricing trends';
Step 4: Create the AI Agent
Now, connect the conversational model with the two skills.
CREATE AGENT ai_agent
USING
model = 'conversational_model',
skills = ['text2sql_skill', 'kb_skill'];
Test the agent:
SELECT question, answer
FROM ai_agent
WHERE question = 'How many houses were sold in 2015?';
Optional: Deploy the Agent to Slack
CREATE DATABASE mindsdb_slack
WITH ENGINE = 'slack',
PARAMETERS = {
"token": "xoxb-xxx",
"app_token": "xapp-xxx"
};
CREATE CHATBOT ai_chatbot
USING
database = 'mindsdb_slack',
agent = 'ai_agent';
Your AI agent is now live on Slack and can be interacted with directly by your team.
Automating Knowledge Base Updates
To refresh your knowledge base regularly:
CREATE JOB update_kb_hourly AS (
INSERT INTO my_knowledge_base (
SELECT * FROM data_source WHERE id > LAST
)
) EVERY hour;
Example Questions You Can Ask
- "How many houses were sold in California in 2012?"
- "Summarize the key insights from the PDF report."
- "What is the average price of houses sold per state in 2015?"
Use Cases
- Internal AI assistants for querying data systems
- Customer support bots with product manuals in PDF format
- AI dashboards for real-time analytics and summaries
- Slack bots that act as knowledge agents for your team
SEO Keywords (Technical)
- Build AI agent with MindsDB
- Natural language to SQL chatbot
- LangChain agent integration
- RAG PDF chatbot with SQL
- PostgreSQL GPT chatbot
- Hugging Face Embeddings MindsDB
- MindsDB Slack integration
- Enterprise LLM agent orchestration
Conclusion
By combining MindsDB with OpenAI or Hugging Face models, you can rapidly develop intelligent agents that understand both structured and unstructured data sources. With just a few SQL-like commands, we’ve created an end-to-end AI workflow that can be deployed across business-critical applications.
For enterprise use cases, this stack scales well for document understanding, sales analytics, research summarization, and beyond.