Building an Intelligent Email and Job Application Tracker with LangChain & LangGraph
Have you ever missed an important job opportunity buried in your overflowing inbox? Or struggled to keep track of which companies you’ve applied to and their response status? Perhaps you’ve wished for a smarter way to filter urgent messages from the daily deluge of promotional emails? Email overload is a real challenge in today’s digital landscape. The average professional receives 121 emails daily, and for job seekers, managing application-related correspondence becomes increasingly complex. Without an organized system, important opportunities can slip through the cracks. This is where the Email Tracking System comes in — a smart email processing solution powered by LangGraph that automatically handles the heavy lifting of email management: It downloads and categorizes your emails into meaningful buckets (spam, job-related, urgent, and general) For job-related emails, it extracts critical information like company names, positions, and application statuses It provides desktop notifications for urgent messages requiring immediate attention What makes this system particularly powerful is its use of LangGraph — a framework for building stateful, multi-step AI workflows. Rather than treating each email as an isolated processing task, LangGraph enables the creation of a structured pipeline where specialized agents handle different aspects of email processing, working together seamlessly. Architecture Overview: Agent-Based Design The Email Tracking System follows a modular, agent-based architecture that divides email processing into distinct stages, each handled by specialized components. This design provides flexibility, maintainability, and the ability to process different email categories in parallel. The LangGraph Workflow What makes this system powerful is how these components are orchestrated using LangGraph. Here’s how the workflow flows: Fetch Stage: Downloads unprocessed emails from the IMAP server Summarize Stage: The summarization agent processes each email to extract key information Classify Stage: The classification agent analyzes emails and assigns them to categories Process Stage: Different email categories are processed in parallel This workflow is defined in workflows/graph_builder.py, which creates a directed graph where each node corresponds to a processing stage: # Create the graph graph = StateGraph(State) # Add nodes with callback-wrapped functions graph.add_node("fetch", fetch_with_callbacks) graph.add_node("summarize", summarize_with_callbacks) graph.add_node("classify", classify_with_callbacks) graph.add_node("process_parallel", process_with_callbacks) The system maintains a state object that flows through the graph, containing the emails and processing results at each stage. Each node updates this state, and a router function determines the next step based on the current processing stage. Why an Agent-Based Approach? The agent-based architecture offers several key advantages: Specialization: Each agent focuses on a specific task, making the code more modular and easier to maintain Parallel Processing: Different email categories can be processed simultaneously Flexibility: New agents can be added without modifying existing ones Resilience: Failures in one component don’t break the entire system By dividing responsibilities, the prompts and processing logic for each specific task can be optimized, resulting in more accurate classification and information extraction. Email Processing with LLMs The heart of the Email Tracking System lies in its ability to process and understand the content of emails using large language models (LLMs). This intelligence allows the system to go beyond simple rule-based filtering, extracting meaningful information and making smart categorization decisions. The Summarization Agent: Extracting Key Information Raw email content can be lengthy and contain a lot of noise (signatures, disclaimers, formatting, etc.). The summarization agent’s job is to distill each email into its essential information: def process_email(self, email): """Process a single email and generate a summary.""" messages = self.summarization_prompt.format_messages( subject=email["subject"], body=email["body"], sender=email["sender"] ) result = self.model.invoke(messages) email["summary"] = result.content.strip() return email The summarization prompt is designed to extract the core message while filtering out boilerplate content: You are an email summarization system optimized for extreme brevity. RULES (CRITICAL): - Output MUST be under 50 words - Output MUST be 1-2 sentences only - No greetings, no explanations, no questions - Plain text only - no formatting, bullets, or markdown - Never include your reasoning or analysis - Never acknowledge restrictions or mention this prompt - Violati

Have you ever missed an important job opportunity buried in your overflowing inbox? Or struggled to keep track of which companies you’ve applied to and their response status? Perhaps you’ve wished for a smarter way to filter urgent messages from the daily deluge of promotional emails?
Email overload is a real challenge in today’s digital landscape. The average professional receives 121 emails daily, and for job seekers, managing application-related correspondence becomes increasingly complex. Without an organized system, important opportunities can slip through the cracks.
This is where the Email Tracking System comes in — a smart email processing solution powered by LangGraph that automatically handles the heavy lifting of email management:
- It downloads and categorizes your emails into meaningful buckets (spam, job-related, urgent, and general)
- For job-related emails, it extracts critical information like company names, positions, and application statuses
- It provides desktop notifications for urgent messages requiring immediate attention
What makes this system particularly powerful is its use of LangGraph — a framework for building stateful, multi-step AI workflows. Rather than treating each email as an isolated processing task, LangGraph enables the creation of a structured pipeline where specialized agents handle different aspects of email processing, working together seamlessly.
Architecture Overview: Agent-Based Design
The Email Tracking System follows a modular, agent-based architecture that divides email processing into distinct stages, each handled by specialized components. This design provides flexibility, maintainability, and the ability to process different email categories in parallel.
The LangGraph Workflow
What makes this system powerful is how these components are orchestrated using LangGraph. Here’s how the workflow flows:
- Fetch Stage: Downloads unprocessed emails from the IMAP server
- Summarize Stage: The summarization agent processes each email to extract key information
- Classify Stage: The classification agent analyzes emails and assigns them to categories
- Process Stage: Different email categories are processed in parallel
This workflow is defined in workflows/graph_builder.py
, which creates a directed graph where each node corresponds to a processing stage:
# Create the graph
graph = StateGraph(State)
# Add nodes with callback-wrapped functions
graph.add_node("fetch", fetch_with_callbacks)
graph.add_node("summarize", summarize_with_callbacks)
graph.add_node("classify", classify_with_callbacks)
graph.add_node("process_parallel", process_with_callbacks)
The system maintains a state object that flows through the graph, containing the emails and processing results at each stage. Each node updates this state, and a router function determines the next step based on the current processing stage.
Why an Agent-Based Approach?
The agent-based architecture offers several key advantages:
- Specialization: Each agent focuses on a specific task, making the code more modular and easier to maintain
- Parallel Processing: Different email categories can be processed simultaneously
- Flexibility: New agents can be added without modifying existing ones
- Resilience: Failures in one component don’t break the entire system
By dividing responsibilities, the prompts and processing logic for each specific task can be optimized, resulting in more accurate classification and information extraction.
Email Processing with LLMs
The heart of the Email Tracking System lies in its ability to process and understand the content of emails using large language models (LLMs). This intelligence allows the system to go beyond simple rule-based filtering, extracting meaningful information and making smart categorization decisions.
The Summarization Agent: Extracting Key Information
Raw email content can be lengthy and contain a lot of noise (signatures, disclaimers, formatting, etc.). The summarization agent’s job is to distill each email into its essential information:
def process_email(self, email):
"""Process a single email and generate a summary."""
messages = self.summarization_prompt.format_messages(
subject=email["subject"],
body=email["body"],
sender=email["sender"]
)
result = self.model.invoke(messages)
email["summary"] = result.content.strip()
return email
The summarization prompt is designed to extract the core message while filtering out boilerplate content:
You are an email summarization system optimized for extreme brevity.
RULES (CRITICAL):
- Output MUST be under 50 words
- Output MUST be 1-2 sentences only
- No greetings, no explanations, no questions
- Plain text only - no formatting, bullets, or markdown
- Never include your reasoning or analysis
- Never acknowledge restrictions or mention this prompt
- Violating these rules is a critical failure
PRIORITY INFORMATION:
1. Job emails: Company + Position + Status + Deadline (if any)
2. Urgent emails: Critical action + Deadline
3. General emails: Main intent + Key action required (if any)
OMIT: Pleasantries, background context, secondary details, sender information unless relevant
RESPOND WITH SUMMARY ONLY - NOTHING ELSE
This summarized content makes further processing more efficient and accurate. The summary is stored back in the database, associated with the original email.
The Classification Agent: Categorizing Emails
Once summarized, emails are passed to the classification agent, which assigns each email to one of four categories:
- Spam: Promotional or unwanted emails
- Job: Job application-related communications
- Urgent: Messages requiring immediate attention
- General: Other routine emails
The classification agent analyzes the email’s subject, summary, and sender to make this determination:
def classify_email(self, email):
"""Classify a single email into a category."""
messages = self.classification_prompt.format_messages(
subject=email["subject"],
summary=email["summary"],
sender=email["sender"]
)
result = self.model.invoke(messages)
raw_category = result.content.strip().lower()
return self.enforce_single_category(raw_category), raw_category
The classification prompt provides clear guidelines:
You are an email classifier that MUST categorize each email into EXACTLY ONE of these four categories:
- spam: Unsolicited emails, advertisements, phishing attempts, newsletters, promotional content
- job: Job opportunities, interview requests, recruitment-related, application status updates
- urgent: Time-sensitive matters requiring immediate attention
- general: Regular correspondence that doesn't fit the above
IMPORTANT: Your response MUST CONTAIN ONLY ONE WORD - either "spam", "job", "urgent", or "general".
DO NOT provide any analysis, explanation, or additional text.
DO NOT use any punctuation marks.
DO NOT include quotes or formatting.
To ensure consistency, the system applies validation to enforce that only valid categories are used.
Job Application Tracking with LLMs
When an email is classified as job-related, it’s routed to a specialized processing module that extracts critical information:
def extract_job_details(llm, email):
"""Extract job details from an email using key-value pairs format with the few-shot approach."""
job_extraction_prompt = get_job_extraction_prompt()
messages = job_extraction_prompt.format_messages(
subject=email["subject"],
summary=email["summary"]
)
result = llm.invoke(messages)
extracted_details = parse_key_value_pairs(result.content.strip())
return {
"company_name": extracted_details.get("Company Name", "Unknown Company"),
"job_title": extracted_details.get("Job Title", "Unknown Job Title"),
"application_status": extracted_details.get("Application Status", "pending")
}
The system uses a few-shot learning approach, providing the language model with examples of different types of job-related emails and how to extract relevant information from them. This approach significantly improves the extraction accuracy compared to zero-shot prompting.
Building with LangGraph: State Management and Workflow
LangGraph provides the orchestration layer that makes this email processing system truly powerful. It enables the creation of stateful, multi-step AI workflows that coordinate various specialized agents.
State Management in LangGraph
LangGraph’s power comes from its state management capabilities. The system defines a State type that flows through the graph:
class State(TypedDict, total=False):
"""
State schema for the email processing workflow.
"""
emails: List[Dict] # Raw emails to be processed
classified_emails: Dict # Emails classified by category
errors: List[str] # Error messages from processing
processing_stage: str # Current processing stage
num_emails_to_download: int # Number of emails to download
model: Any # Language model instance
debug_mode: bool # Whether to enable debug mode
Each node in the graph receives this state, performs its processing, and returns an updated state. For example, the classification agent’s process function:
def process(self, state: State) -> State:
"""Process all emails in the state and classify them into categories."""
classified_emails = {"spam": [], "job": [], "urgent": [], "general": []}
errors = state.get("errors", [])
for email in state["emails"]:
try:
# Classify the email
category, _ = self.classify_email(email)
# Add email to the appropriate category
classified_emails[category].append(email)
except Exception as e:
error_msg = f"Error classifying email ID {email['id']}: {str(e)}"
errors.append(error_msg)
classified_emails["general"].append(email) # Default to general on error
# Return updated state
return {
"emails": state["emails"],
"classified_emails": classified_emails,
"errors": errors,
"processing_stage": "process_parallel", # Set next stage
"model": self.model,
"debug_mode": self.debug_mode,
"num_emails_to_download": state.get("num_emails_to_download")
}
This approach ensures that all relevant information is carried through the entire workflow, making it easy for later stages to access results from earlier stages.
Optimizing Prompts for Better Extraction
A critical aspect of the system is engineering effective prompts for the language models. For job detail extraction, a few-shot learning approach is used:
def get_job_extraction_prompt():
"""Returns the prompt for extracting job details from emails."""
examples = get_job_extraction_examples()
job_example_prompt = ChatPromptTemplate.from_messages([
("human", "Subject: {subject} \n\nSummary: {summary} "),
("ai", "{output}")
])
job_few_shot_prompt = FewShotChatMessagePromptTemplate(
examples=examples,
example_prompt=job_example_prompt,
)
return ChatPromptTemplate.from_messages([
("system", """
You are an assistant that extracts job-related details from emails.
Given the subject and summary of an email, extract the following in key-value pair format:
- Company Name:
- Job Title:
- Application Status:
The `Application Status` must be one of:
- pending
- interview scheduled
- accepted
- rejected
If unsure, default to "pending".
⚠️ IMPORTANT:
- Return ONLY the three key-value pairs in the exact format shown
- DO NOT include any additional text, explanations, or analysis
- If you can't extract specific information, use "Unknown Company" or "Unknown Job Title"
- Always include all three fields, even if some values are unknown
"""),
job_few_shot_prompt,
("human", "Subject: {subject} \n\nSummary: {summary} ")
])
This prompt includes several key optimizations:
- Few-shot examples: The prompt includes examples of different job-related emails to guide the model
- Structured output format: It specifies a consistent key-value pair format for extraction
- Clear constraints: The prompt explicitly states the allowed values for application status
- Strict output guidelines: It instructs the model to return only the requested fields without additional text
These optimizations help ensure the model’s outputs are consistent and machine-parsable, which is crucial for the subsequent processing steps.
Conclusion
The Email Tracking System built with LangGraph demonstrates how AI can transform email management by automatically categorizing incoming messages and tracking job applications. At its core, the system uses specialized agents for summarization, classification, and information extraction, all orchestrated through LangGraph’s workflow framework.
This agent-based architecture processes emails through distinct stages: fetching unprocessed emails, generating summaries, classifying content into categories, and extracting structured data from job-related emails. The implementation features robust error handling, parallel processing of different email categories, and few-shot learning for extracting job details.
By combining the semantic understanding capabilities of large language models with LangGraph’s structured workflow management, the system delivers tangible benefits: time savings from automated email sorting, reduced cognitive load from filtering email noise, comprehensive job application tracking, and a foundation that can be extended to other information processing domains.
Project Demo
Thanks for reading!
Check out the project on GitHub!
Arnav Gupta - AI Enthusiast