Secure, Swift, and Smart: A Basic Guide to Building AI Agentic Workflows with Local Models

Artificial intelligence is transforming how we solve problems, but flagship models like GPT-4 come with drawbacks: high costs, latency, and privacy concerns when sensitive data leaves your network. Local models offer a powerful alternative—smaller, faster, and fully contained within your infrastructure. In this basic guide, we’ll walk through building an AI agentic workflow using a local 14B parameter model to create a SQL agent that autonomously handles database queries. This overview focuses on key concepts and best practices to make it efficient, reliable, and enterprise-ready. Why Local Models? Local models require more setup than commercial APIs, often needing custom training to perform at their best, and they don’t match the broad intelligence of flagship models. However, their strengths are clear: Speed: Smaller models mean faster inference, ideal for real-time tasks. Privacy: Data never leaves your network, crucial for sensitive enterprise applications. Cost: No recurring API fees—just your hardware investment. Our example uses a FastAPI app with a local model to build a SQL agent. Let’s explore the process. Step 1: Define the Agent’s Role An agentic workflow empowers an AI to handle multi-step tasks by reasoning and using tools. Our SQL agent interprets user prompts, identifies databases and tables, and executes queries—all locally. Here’s the system instruction: SYSTEM_INSTRUCTION = """ You are a MS SQL expert Db Admin agent. Your job is to assist users in querying a Microsoft SQL Server database using provided tools. Address users casually and: 1. Identify database and table names, even from vague inputs. 2. Use tools to resolve ambiguities before asking for clarification. 3. Handle SQL operations: SELECTs, schema inspection, stored procedures, and more. 4. Default to 25 records for general SELECTs unless specified. 5. Format results clearly: tables for data, SQL for schemas. 6. Only seek clarification when truly stuck. Available tools: - list_databases: List all databases. - all_table_names: List tables in a database. - filter_table_names: Find tables matching a substring. - schema_definitions: Get table schemas. - execute_query: Run a SQL query. """ This defines the agent’s purpose and toolset, setting the stage for autonomous operation. Step 2: Set Up the Framework We use FastAPI for the API and an agent framework to manage the workflow. Here’s the basic structure: from fastapi import FastAPI from mcp_agent.app import MCPApp from mcp_agent.agents.agent import Agent app = FastAPI(title="SQL Agent API", version="2.0") mcp_app = MCPApp(name="sql_agent") @app.post("/v1/chat/completions") async def chat_completions(request: Request): body = await request.json() user_prompt = next((m["content"] for m in reversed(body.get("messages", [])) if m["role"] == "user"), "") async def stream_response(): async for chunk in sql_agent_workflow(user_prompt): yield chunk return StreamingResponse(content=stream_response(), media_type="text/event-stream") This sets up a streaming endpoint, leveraging asynchronous generators to deliver real-time responses—a feature that enhances user experience by providing immediate feedback. Step 3: Optimize with Logical Classifiers To save resources, we classify queries with logic before invoking the AI: def classify_query(prompt): prompt_lower = prompt.lower() if re.search(r"execute\s+this\s+query[^:]*:", prompt_lower): return "execute_query" if "schema" in prompt_lower and "table" in prompt_lower: return "table_schema" return "general_query" This reduces latency by handling straightforward tasks without AI overhead. Step 4: Extract Context Intelligently The agent infers context like database names from prompts: def extract_database_name(prompt): prompt_lower = prompt.lower() db_aliases = {"enterprisedb": "enterprise_db", "timesheets": "timesheets_db"} explicit_match = re.search(r"(?:in|on|for|from)\s+([a-z0-9_\s.-]+?)(?:\s+database|\s+db)?$", prompt_lower) if explicit_match: db_name = explicit_match.group(1).strip() return db_aliases.get(db_name, db_name) return "master" # Default fallback This ensures the workflow adapts to vague inputs efficiently. Step 5: Leverage Tool Calling with Caching Tools interact with the database, and caching boosts performance: TOOL_RESULT_CACHE: Dict[str, Dict[str, Any]] = {} DEFAULT_CACHE_TTL_SECONDS = 600 async def call_tool_with_cache(sql_agent: Agent, tool_name: str, tool_args: Dict[str, Any], logger, ttl_seconds: int = DEFAULT_CACHE_TTL_SECONDS): cache_key = f"{tool_name}::{json.dumps(tool_args, sort_keys=True)}" current_time = time.time() if cache_key in TOOL_RESULT_CACHE and (current_time - TOOL_RESULT_CACHE[cache_key]["timestamp"]) AsyncGenerator[str, None]: async with mcp_app.run() as agent_app:

May 11, 2025 - 20:05

Secure, Swift, and Smart: A Basic Guide to Building AI Agentic Workflows with Local Models

Artificial intelligence is transforming how we solve problems, but flagship models like GPT-4 come with drawbacks: high costs, latency, and privacy concerns when sensitive data leaves your network. Local models offer a powerful alternative—smaller, faster, and fully contained within your infrastructure. In this basic guide, we’ll walk through building an AI agentic workflow using a local 14B parameter model to create a SQL agent that autonomously handles database queries. This overview focuses on key concepts and best practices to make it efficient, reliable, and enterprise-ready.

Why Local Models?

Local models require more setup than commercial APIs, often needing custom training to perform at their best, and they don’t match the broad intelligence of flagship models. However, their strengths are clear:

Speed: Smaller models mean faster inference, ideal for real-time tasks.
Privacy: Data never leaves your network, crucial for sensitive enterprise applications.
Cost: No recurring API fees—just your hardware investment.

Our example uses a FastAPI app with a local model to build a SQL agent. Let’s explore the process.

Step 1: Define the Agent’s Role

An agentic workflow empowers an AI to handle multi-step tasks by reasoning and using tools. Our SQL agent interprets user prompts, identifies databases and tables, and executes queries—all locally.

Here’s the system instruction:

SYSTEM_INSTRUCTION = """
You are a MS SQL expert Db Admin agent. Your job is to assist users in querying a Microsoft SQL Server database using provided tools. Address users casually and:

1. Identify database and table names, even from vague inputs.
2. Use tools to resolve ambiguities before asking for clarification.
3. Handle SQL operations: SELECTs, schema inspection, stored procedures, and more.
4. Default to 25 records for general SELECTs unless specified.
5. Format results clearly: tables for data, SQL for schemas.
6. Only seek clarification when truly stuck.

Available tools:
- list_databases: List all databases.
- all_table_names: List tables in a database.
- filter_table_names: Find tables matching a substring.
- schema_definitions: Get table schemas.
- execute_query: Run a SQL query.
"""

This defines the agent’s purpose and toolset, setting the stage for autonomous operation.

Step 2: Set Up the Framework

We use FastAPI for the API and an agent framework to manage the workflow. Here’s the basic structure:

from fastapi import FastAPI
from mcp_agent.app import MCPApp
from mcp_agent.agents.agent import Agent

app = FastAPI(title="SQL Agent API", version="2.0")
mcp_app = MCPApp(name="sql_agent")

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    body = await request.json()
    user_prompt = next((m["content"] for m in reversed(body.get("messages", [])) if m["role"] == "user"), "")
    async def stream_response():
        async for chunk in sql_agent_workflow(user_prompt):
            yield chunk
    return StreamingResponse(content=stream_response(), media_type="text/event-stream")

This sets up a streaming endpoint, leveraging asynchronous generators to deliver real-time responses—a feature that enhances user experience by providing immediate feedback.

Step 3: Optimize with Logical Classifiers

To save resources, we classify queries with logic before invoking the AI:

def classify_query(prompt):
    prompt_lower = prompt.lower()
    if re.search(r"execute\s+this\s+query[^:]*:", prompt_lower):
        return "execute_query"
    if "schema" in prompt_lower and "table" in prompt_lower:
        return "table_schema"
    return "general_query"

This reduces latency by handling straightforward tasks without AI overhead.

Step 4: Extract Context Intelligently

The agent infers context like database names from prompts:

def extract_database_name(prompt):
    prompt_lower = prompt.lower()
    db_aliases = {"enterprisedb": "enterprise_db", "timesheets": "timesheets_db"}
    explicit_match = re.search(r"(?:in|on|for|from)\s+([a-z0-9_\s.-]+?)(?:\s+database|\s+db)?$", prompt_lower)
    if explicit_match:
        db_name = explicit_match.group(1).strip()
        return db_aliases.get(db_name, db_name)
    return "master"  # Default fallback

This ensures the workflow adapts to vague inputs efficiently.

Step 5: Leverage Tool Calling with Caching

Tools interact with the database, and caching boosts performance:

TOOL_RESULT_CACHE: Dict[str, Dict[str, Any]] = {}
DEFAULT_CACHE_TTL_SECONDS = 600

async def call_tool_with_cache(sql_agent: Agent, tool_name: str, tool_args: Dict[str, Any], logger, ttl_seconds: int = DEFAULT_CACHE_TTL_SECONDS):
    cache_key = f"{tool_name}::{json.dumps(tool_args, sort_keys=True)}"
    current_time = time.time()
    if cache_key in TOOL_RESULT_CACHE and (current_time - TOOL_RESULT_CACHE[cache_key]["timestamp"]) < ttl_seconds:
        logger.info(f"Cache HIT for '{tool_name}'")
        return TOOL_RESULT_CACHE[cache_key]["result_text"]
    result = await sql_agent.call_tool(tool_name, tool_args)
    result_text = result.content[0].text if result.content else ""
    TOOL_RESULT_CACHE[cache_key] = {"timestamp": current_time, "result_text": result_text}
    return result_text

Caching minimizes redundant calls, improving speed.

Step 6: Build the Core Workflow with Human Intervention

The workflow orchestrates everything, including human intervention when needed:

async def sql_agent_workflow(user_prompt: str) -> AsyncGenerator[str, None]:
    async with mcp_app.run() as agent_app:
        sql_agent = Agent(name="sql_agent", instruction=SYSTEM_INSTRUCTION, server_names=["sql_server"])
        async with sql_agent:
            llm = await sql_agent.attach_llm(OpenAIAugmentedLLM)  # Local 14B model
            query_type = classify_query(user_prompt)
            db_name = extract_database_name(user_prompt)

            if query_type == "execute_query":
                query = re.search(r"execute\s+this\s+query[^:]*:\s*([^;]+);", user_prompt, re.IGNORECASE).group(1)
                result = await call_tool_with_cache(sql_agent, "execute_query", {"query": query, "database": db_name}, agent_app.logger)
                yield f"Dude, here’s your result:\n{result}"
                return

            max_iterations = 25
            for _ in range(max_iterations):
                response = await llm.generate_str(message=user_prompt, request_params=RequestParams(model="local_14b"))
                if "" in response:
                    yield "Hey, I’m stuck! Can you clarify: " + response.split("")[1].split("")[0]
                    return
                tool_name, tool_args = await extract_tool_call(response)
                if tool_name:
                    result = await call_tool_with_cache(sql_agent, tool_name, tool_args, agent_app.logger)
                    yield f"Tool {tool_name} result:\n{result}"
                    if tool_name == "execute_query":
                        return
                user_prompt = f"Based on: {result}, continue: {user_prompt}"

This uses an asynchronous generator for streaming and includes a conversation context (user_prompt updates with results) to maintain state across interactions. Crucially, it supports human intervention: when the agent encounters ambiguity it can’t resolve with tools, it uses a tag to ask the user for help, preventing infinite loops and ensuring progress.

Step 7: Handle Complexity and Scale

A 14B model needs robust hardware (e.g., an NVIDIA A40, A100 GPU). Scaling might involve quantization or additional GPUs. There are no API costs, but fine-tuning requires expertise.

Step 8: Test and Deploy

Test classifiers with unit tests and the workflow with integration tests. Monitor performance via logs built into the code.

Wrap-Up

This workflow combines speed, privacy, and smarts. Key features include:

Human Intervention: The agent asks for clarification when stuck, improving reliability.
Streaming Responses: Asynchronous generators provide real-time feedback.
Conversation Context: State persists across interactions for seamless follow-ups.
Caching: Reduces redundant database calls for efficiency.

Local models are evolving—swap in a better one, and this setup will adapt. Do let me know if you wanna dig deeper or any suggestions, queries. All ears. Hope this helps!

Citations

MCP-agent Framework: https://github.com/lastmile-ai/mcp-agent
MCP Servers: https://github.com/modelcontextprotocol/servers