Building an Excel Summarizer Using Streamlit, Karo, and LLMs
In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways. What We're Building Our application will: Allow users to upload Excel files Read and display a preview of the data Use OpenAI or Anthropic APIs to generate insights about the data Present summaries and key takeaways in a user-friendly interface Prerequisites Basic knowledge of Python A computer with Python installed (3.8 or higher recommended) An OpenAI API key or Anthropic API key (you'll need to have funds in your account) A GitHub account (for deployment) Step 1: Setting Up Your Development Environment Let's start by creating a dedicated project folder and setting up a virtual environment: # Create project directory mkdir excel_summarizer cd excel_summarizer # Create a virtual environment # For Windows python -m venv venv # For macOS/Linux python3 -m venv venv # Activate the virtual environment # For Windows venv\Scripts\activate # For macOS/Linux source venv/bin/activate Next, open the project in your favorite code editor. If you're using Visual Studio Code: code . Step 2: Installing Required Packages Install the necessary packages: pip install karo streamlit python-dotenv pandas openpyxl chromadb Note: We're installing chromadb as it works with the base agent. Step 3: Creating the Excel Reader Tool Create a new file called excel_tool_reader.py with the following code: import pandas as pd import os import logging from typing import List, Dict, Any, Optional, Union from pydantic import Field, FilePath from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema logger = logging.getLogger(__name__) class ExcelReaderInput(BaseToolInputSchema): """Input schema for the ExcelReader tool.""" file_path: FilePath = Field(..., description="Path to the Excel file.") sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet") max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.") max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.") class ExcelReaderOutput(BaseToolOutputSchema): """Output schema for the ExcelReader tool.""" file_path: str = Field(..., description="The path of file that was read") sheet_name_read: str = Field(..., description="The name of the sheet that was actually read") data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data") row_count: Optional[int] = Field(None, description="The total number of rows read") column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)") class ExcelReader(BaseTool): """Tool to read data from an Excel file.""" name = "excel_reader" description: str = "Reads data from an Excel file and returns a preview of the data." input_schema = ExcelReaderInput output_schema = ExcelReaderOutput def __init__(self, config: Optional[Any] = None): """Initialize the ExcelReader tool.""" logger.info("ExcelReaderTool initialized.") pass def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput: """Reads the specificied Excel file and returns a data preview.""" # Read the specified sheet from the Excel file try: import openpyxl except ImportError: logger.error("openpyxl is not installed. Please install it to read Excel files.") return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A") if not isinstance(input_data, self.input_schema): return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A") file_path_str = str(input_data.file_path) if not os.path.exists(file_path_str): return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A") try: excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl") sheet_names = excel_file.sheet_names sheet_to_read: Union[str, int] = 0 sheet_name_read: str = sheet_names[0] if input_data.sheet_name is not None: if isinstance(input_data.sheet_name, int): if 0 requirements.txt Step 10: Creating a GitHub Repository If you don't have a GitHub account: Go to github.com and sign u

In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways.
What We're Building
Our application will:
- Allow users to upload Excel files
- Read and display a preview of the data
- Use OpenAI or Anthropic APIs to generate insights about the data
- Present summaries and key takeaways in a user-friendly interface
Prerequisites
- Basic knowledge of Python
- A computer with Python installed (3.8 or higher recommended)
- An OpenAI API key or Anthropic API key (you'll need to have funds in your account)
- A GitHub account (for deployment)
Step 1: Setting Up Your Development Environment
Let's start by creating a dedicated project folder and setting up a virtual environment:
# Create project directory
mkdir excel_summarizer
cd excel_summarizer
# Create a virtual environment
# For Windows
python -m venv venv
# For macOS/Linux
python3 -m venv venv
# Activate the virtual environment
# For Windows
venv\Scripts\activate
# For macOS/Linux
source venv/bin/activate
Next, open the project in your favorite code editor. If you're using Visual Studio Code:
code .
Step 2: Installing Required Packages
Install the necessary packages:
pip install karo streamlit python-dotenv pandas openpyxl chromadb
Note: We're installing chromadb as it works with the base agent.
Step 3: Creating the Excel Reader Tool
Create a new file called excel_tool_reader.py with the following code:
import pandas as pd
import os
import logging
from typing import List, Dict, Any, Optional, Union
from pydantic import Field, FilePath
from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema
logger = logging.getLogger(__name__)
class ExcelReaderInput(BaseToolInputSchema):
"""Input schema for the ExcelReader tool."""
file_path: FilePath = Field(..., description="Path to the Excel file.")
sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet")
max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.")
max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.")
class ExcelReaderOutput(BaseToolOutputSchema):
"""Output schema for the ExcelReader tool."""
file_path: str = Field(..., description="The path of file that was read")
sheet_name_read: str = Field(..., description="The name of the sheet that was actually read")
data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data")
row_count: Optional[int] = Field(None, description="The total number of rows read")
column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)")
class ExcelReader(BaseTool):
"""Tool to read data from an Excel file."""
name = "excel_reader"
description: str = "Reads data from an Excel file and returns a preview of the data."
input_schema = ExcelReaderInput
output_schema = ExcelReaderOutput
def __init__(self, config: Optional[Any] = None):
"""Initialize the ExcelReader tool."""
logger.info("ExcelReaderTool initialized.")
pass
def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput:
"""Reads the specificied Excel file and returns a data preview."""
# Read the specified sheet from the Excel file
try:
import openpyxl
except ImportError:
logger.error("openpyxl is not installed. Please install it to read Excel files.")
return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A")
if not isinstance(input_data, self.input_schema):
return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A")
file_path_str = str(input_data.file_path)
if not os.path.exists(file_path_str):
return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
try:
excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl")
sheet_names = excel_file.sheet_names
sheet_to_read: Union[str, int] = 0
sheet_name_read: str = sheet_names[0]
if input_data.sheet_name is not None:
if isinstance(input_data.sheet_name, int):
if 0 <= input_data.sheet_name < len(sheet_names):
sheet_to_read = input_data.sheet_name
sheet_name_read = sheet_names[sheet_to_read]
else:
return self.output_schema(success=False, error_message=f"Sheet index out of range: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
elif isinstance(input_data.sheet_name, str):
if input_data.sheet_name in sheet_names:
sheet_to_read = input_data.sheet_name
sheet_name_read = input_data.sheet_name
else:
return self.output_schema(success=False, error_message=f"Sheet name not found: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
header_df = pd.read_excel(excel_file, sheet_name=sheet_to_read, nrows=0)
all_columns = header_df.columns.tolist()
cols_to_use = all_columns[:input_data.max_cols] if input_data.max_cols else all_columns
df = pd.read_excel(excel_file, sheet_name=sheet_to_read, usecols=cols_to_use, nrows=input_data.max_rows)
preview_rows = min(len(df), 10)
data_preview_str = df.head(preview_rows).to_markdown(index=False)
logger.info(f"Successfully read {len(df)} rows and {len(df.columns)} columns from sheet '{sheet_name_read}' in '{file_path_str}'.")
return self.output_schema(
success=True,
file_path=file_path_str,
sheet_name_read=sheet_name_read,
data_preview=data_preview_str,
row_count=len(df),
column_names=df.columns.tolist()
)
except FileNotFoundError:
logger.error(f"File not found: {file_path_str}")
return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
except Exception as e:
logger.error(f"Error reading Excel file '{file_path_str}': {e}", exc_info=True)
return self.output_schema(success=False, error_message=f"Error reading Excel file: {e}", file_path=file_path_str, sheet_name_read="N/A")
Step 4: Creating the Streamlit Application
Create a file called streamlit_app.py:
import streamlit as st
import pandas as pd
import os
import tempfile
from dotenv import load_dotenv
from karo.prompts.system_prompt_builder import SystemPromptBuilder
from rich.console import Console
from pydantic import Field
from typing import List, Dict, Any
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
dotenv_path = os.path.join(os.path.dirname(__file__), '.env')
load_dotenv(dotenv_path=dotenv_path)
from karo.core.base_agent import BaseAgent, BaseAgentConfig
from karo.providers.openai_provider import OpenAIProvider, OpenAIProviderConfig
from karo.providers.anthropic_provider import AnthropicProvider, AnthropicProviderConfig
from karo.schemas.base_schemas import BaseInputSchema, BaseOutputSchema, AgentErrorSchema
from excel_tool_reader import ExcelReaderInput, ExcelReaderOutput, ExcelReader
console = Console()
st.set_page_config(page_title="Excel Reader Tool", page_icon="