A Comprehensive Guide to Building a Deep Research Agent with Qwen3 Locally

Introduction Qwen 3 is here and it's impressive. As a powerful local LLM, it inspired me to put it to the test with a real-world project: building a local deep research assistant. In this blog, I’ll walk you through creating one from scratch using Qwen3 with Ollama. You’ll also learn how to build smart workflows with LangGraph and track everything in a clean, user-friendly interface in LangGraph—all following practical, real-world methods. Sound exciting? Let’s get started by looking at how it all works! How It Works Local Deep Researcher is inspired by IterDRAG. We will be using IterDRAG approach to build our agent. Here is a simple diagram to help you understand the flow in terms of langraph. In this approach we will decompose a query into sub-queries, retrieve documents for each one, answer the sub-query, and then build on the answer by retrieving docs for the second sub-query. In Lang graph, nodes (rectangular blocks) represent execution, while the edges (arrows) represent flow. I hope the diagram is clear. However, I will explain more details in the respective sections. The Code Now that you have understood how our project is going to work, let's get building. For simplicity, I have divided this section into three subparts: Workspace Setup, Main Code, and Running the Program. If you are a seasoned developer, you can skip to the main code; otherwise, kindly follow the workspace setup. Workspace Setup To ensure our code runs in an isolated environment, let's start setting up the workspace. 1. Define Folder Structure (optional) Head to the terminal and type the following command one by one # create a base folder mkdir qwen3_deep_research_agent cd qwen3_deep_research_agent # Create files in the root directory type nul > .env type nul > langgraph.json type nul > pyproject.toml # Create directories mkdir src mkdir src\ollama_deep_researcher # Create files in src\ollama_deep_researcher type nul > src\ollama_deep_researcher\init.py type nul > src\ollama_deep_researcher\configuration.py type nul > src\ollama_deep_researcher\graph.py type nul > src\ollama_deep_researcher\prompts.py type nul > src\ollama_deep_researcher\state.py type nul > src\ollama_deep_researcher\utils.py Once executed, verify that your folder structure is . (qwen3_deep_research_agent) ├── .env ├── pyproject.toml └── src └── ollama_deep_researcher ├── init.py ├── configuration.py ├── graph.py ├── prompts.py ├── state.py └── utils.py Now that we have the required files in the root, it's time to fill them out. For simplicity, you can click on each of the filenames and copy the code: .env- stores all the environment variables, secrets pyproject.toml - acts as a project config file to standardise & simplify the configuration of Python projects. or requirements.txt- Alternatively, you can also create a requirements.txt file to store all the dependencies (obviously, you have to create it, though). 2. Virtual Environment Setup Next, head to the root folder in terminal and type: python -m venv .venv .venv\Scripts\activate Ensure the output shows: (.venv) path/to.working/dir This activates the virtual environment. Next do: pip install -e . This command looks for a pyproject.toml file and installs all the dependencies present in the dependencies. The -e means editable mode, which links the project directory to the environment (rather than copying files to the site-package folder), instantly making all edited modules (files) available, rather than re-installing. Pretty handy With this, our working environment is set up. Time to write the main code Writing Main Code (agent.py) Assuming you are clear with the logic in “How it Works”, let’s start by loading all the required libraries 1. Load All Required Libraries The project uses various libraries for web interaction (httpx, markdownify), LLM agent orchestration (langchain_ollama, langchain_core, langsmith, langgraph), structured data handling (json, dataclasses, typing, operator). # essentials import re # for model from langchain_ollama import ChatOllama # for data handling import operator from dataclasses import dataclass, field from typing import TypedDict, Annotated, Literal # LangChain components from langchain_ollama import ChatOllama from langsmith import traceable from langchain_core.runnables import RunnableConfig from langchain_core.messages import HumanMessage, SystemMessage from langgraph.graph import START, END, StateGraph # Data handling import json import operator import re from dataclasses import dataclass, field from typing import Dict, Any, List, Union, Optional, TypedDict, Annotated, Literal Next, let’s define the local LLM to use. 2. Define Local LLM I will use the qwen3:8b-q4_k_m model for the demo, i.e., Qwen 3 - 8 billion p

May 13, 2025 - 06:27

A Comprehensive Guide to Building a Deep Research Agent with Qwen3 Locally

Introduction

Qwen 3 is here and it's impressive.

As a powerful local LLM, it inspired me to put it to the test with a real-world project: building a local deep research assistant.

In this blog, I’ll walk you through creating one from scratch using Qwen3 with Ollama.

You’ll also learn how to build smart workflows with LangGraph and track everything in a clean, user-friendly interface in LangGraph—all following practical, real-world methods.

Sound exciting? Let’s get started by looking at how it all works!

How It Works

Local Deep Researcher is inspired by IterDRAG.

We will be using IterDRAG approach to build our agent. Here is a simple diagram to help you understand the flow in terms of langraph.

In this approach we will decompose a query into sub-queries, retrieve documents for each one, answer the sub-query, and then build on the answer by retrieving docs for the second sub-query.

In Lang graph, nodes (rectangular blocks) represent execution, while the edges (arrows) represent flow. I hope the diagram is clear.

However, I will explain more details in the respective sections.

The Code

Now that you have understood how our project is going to work, let's get building.

For simplicity, I have divided this section into three subparts: Workspace Setup, Main Code, and Running the Program. If you are a seasoned developer, you can skip to the main code; otherwise, kindly follow the workspace setup.

Workspace Setup

To ensure our code runs in an isolated environment, let's start setting up the workspace.

1. Define Folder Structure (optional)

Head to the terminal and type the following command one by one

# create a base folder
mkdir qwen3_deep_research_agent
cd qwen3_deep_research_agent

# Create files in the root directory
type nul > .env
type nul > langgraph.json
type nul > pyproject.toml

# Create directories
mkdir src
mkdir src\ollama_deep_researcher

# Create files in src\ollama_deep_researcher
type nul > src\ollama_deep_researcher\__init__.py
type nul > src\ollama_deep_researcher\configuration.py
type nul > src\ollama_deep_researcher\graph.py
type nul > src\ollama_deep_researcher\prompts.py
type nul > src\ollama_deep_researcher\state.py
type nul > src\ollama_deep_researcher\utils.py

Once executed, verify that your folder structure is

. (qwen3_deep_research_agent)
├── .env
├── pyproject.toml
└── src
    └── ollama_deep_researcher
        ├── __init__.py
        ├── configuration.py
        ├── graph.py
        ├── prompts.py
        ├── state.py
        └── utils.py

Now that we have the required files in the root, it's time to fill them out. For simplicity, you can click on each of the filenames and copy the code:

.env- stores all the environment variables, secrets
pyproject.toml - acts as a project config file to standardise & simplify the configuration of Python projects.

requirements.txt- Alternatively, you can also create a requirements.txt file to store all the dependencies (obviously, you have to create it, though).

2. Virtual Environment Setup

Next, head to the root folder in terminal and type:

python -m venv .venv
.venv\Scripts\activate

Ensure the output shows:

(.venv) path/to.working/dir

This activates the virtual environment.

Next do:

pip install -e .

This command looks for a pyproject.toml file and installs all the dependencies present in the dependencies.

The -e means editable mode, which links the project directory to the environment (rather than copying files to the site-package folder), instantly making all edited modules (files) available, rather than re-installing. Pretty handy

With this, our working environment is set up. Time to write the main code

Writing Main Code (agent.py)

Assuming you are clear with the logic in “How it Works”, let’s start by loading all the required libraries

1. Load All Required Libraries

The project uses various libraries for web interaction (httpx, markdownify), LLM agent orchestration (langchain_ollama, langchain_core, langsmith, langgraph), structured data handling (json, dataclasses, typing, operator).

# essentials
import re

# for model
from langchain_ollama import ChatOllama

# for data handling
import operator
from dataclasses import dataclass, field
from typing import TypedDict, Annotated, Literal 

# LangChain components
from langchain_ollama import ChatOllama
from langsmith import traceable
from langchain_core.runnables import RunnableConfig
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import START, END, StateGraph

# Data handling
import json
import operator
import re  
from dataclasses import dataclass, field
from typing import Dict, Any, List, Union, Optional, TypedDict, Annotated, Literal

Next, let’s define the local LLM to use.

2. Define Local LLM

I will use the qwen3:8b-q4_k_m model for the demo, i.e., Qwen 3 - 8 billion parameters quantized to 4 bits with low GPU usage. You are free to use your own.

Paste the following code:

I will use the qwen3:8b-q4_k_m model for the demo, i.e., Qwen 3 - 8 billion parameters quantized to 4 bits with low GPU usage. You are free to use your own.

Paste the following code

# imports
from langchain_ollama import ChatOllama

# instantiate the model
local_llm = "qwen3:8b-q4_K_M"
llm  = ChatOllama(model = local_llm, temperature = 0)

# set llm json 
llm_json_mode = ChatOllama(model = local_llm, temperature = 0, format='json')

llm_json_mode

The code imports ChatOllama module, loads the qwen3 model, creates an instance of the model, defines a couple of parameters (temperature and format) and activates the json mode.

Next, let’s define the states

3. Define States

In langgraph, states are structures/objects that preserve all the information during the agent's lifetime. Think of them as memories that store every interaction and update them based on agent actions.

Let’s define 3 states:

SummaryState - to all interaction data of the agent,
SummaryStateInput - to store user-input data interaction - in this case, *research_topic*, the only part exposed to the user
SummaryStateOutput - to store the agent output

Here is how to define it:

# summary state data class - core & preserves all important information
@dataclass(kw_only=True)
class SummaryState:
    """Summary state data class."""
    research_topic: str = field(default=None) # report topic
    search_query: str = field(default=None) # search query
    web_research_results : Annotated[list, operator.add] = field(default_factory=list) # web research results
    sources_gathered : Annotated[list, operator.add] = field(default_factory=list) # sources gathered (urls)
    research_loop_count : int = field(default=0) # research loop count - for iteration tracking
    final_summary: str = field(default=None) # final report

# summary state input object -  to let user define the research topic
dataclass(kw_only=True)
class SummaryStateInput(TypedDict):
    """user input"""
    research_topic: str = field(default=None) # report topic

# summary state output object - to store all the output info
@dataclass(kw_only=True)
class SummaryStateOutput(TypedDict):
    """Summary output"""
    final_summary: str = field(default=None) # Final report

A lot is going on here, but let me simplify a bit:

The SummaryState class with fields that track the entire research and summarisation workflow, which includes:
- research_topic and search_query - to capture the focus of the work,
- web_research_results and sources_gathered - lists to store accumulated findings and source URLs,
- research_loop_count - to track how many iterative steps the research has gone through.
- final_summary - to hold the completed summary report.
Using @dataclass(kw_only=True) enforces keyword-only initialisation, improving clarity and reducing errors during object construction.
Then, a Summarystateinput, a TypedDict, defines what the user needs to provide to kick off the research process—currently, just a single field: research_topic.
Finally, SummaryStateOutput, also a TypedDict, holds the result of the process, specifically, the final_summary.

Next, let’s add all the required prompts.

4. Define Prompts

Prompt defines how models should act and is often considered the most essential part of any AI application.

As a side note, be specific, explicit and detailed when writing your prompt to generate the best possible output.

For our application, we will define three prompts:

Query Writer Prompt - Generates a query out of the given research topic, kind of an entry point
Summarise Prompt - Summarises all the text fetched by the web research agent to ponder next.
Reflection Prompt - To reflect on the summary and find gaps, and generate a follow-up question for the next web search.

Here are the prompts I have used; feel free to modify them if you like:

# qwery writer prompt - model feeding prompt
query_writer_prompt="""Your goal is to generate targeted web search query.
The query will gather information related to a specific topic.

Topic:
{research_topic}

Return your query as a JSON object:
{{
    "query": "string",
    "aspect": "string",
    "rationale": "string"
}}
"""
# rationale - why this query is important. incourages model to think about generation of a query itself
query_writer_prompt

# summarizer instructions prompt - 1st summarisation
summarizer_instructions_prompt="""Your goal is to generate a high-quality summary of the web search results.

When EXTENDING an existing summary:
1. Seamlessly integrate new information without repeating what's already covered
2. Maintain consistency with the existing content's style and depth
3. Only add new, non-redundant information
4. Ensure smooth transitions between existing and new content

When creating a NEW summary:
1. Highlight the most relevant information from each source
2. Provide a concise overview of the key points related to the report topic
3. Emphasize significant findings or insights
4. Ensure a coherent flow of information

In both cases:
- Focus on factual, objective information
- Maintain a consistent technical depth
- Avoid redundancy and repetition
- DO NOT use phrases like "based on the new results" or "according to additional sources"
- DO NOT add a preamble like "Here is an extended summary ..." Just directly output the summary.
- DO NOT add a References or Works Cited section.
"""

summarizer_instructions_prompt

# reflection prompt - for agents internal thinking
reflection_instructions_prompt = """You are an expert research assistant analyzing a summary about {research_topic}.

Your tasks:
1. Identify knowledge gaps or areas that need deeper exploration
2. Generate a follow-up question that would help expand your understanding
3. Focus on technical details, implementation specifics, or emerging trends that weren't fully covered

Ensure the follow-up question is self-contained and includes necessary context for web search.

Return your analysis as a JSON object:
{{ 
    "knowledge_gap": "string",
    "follow_up_query": "string"
}}"""

reflection_instructions_prompt

The query_writer_prompt passes instructions to the model as a JSON dict, and the same goes for the reflection prompt.
All this continues in a loop till a specific condition is met. (spoiler alert