Building an Excel Summarizer Using Streamlit, Karo, and LLMs

In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways. What We're Building Our application will: Allow users to upload Excel files Read and display a preview of the data Use OpenAI or Anthropic APIs to generate insights about the data Present summaries and key takeaways in a user-friendly interface Prerequisites Basic knowledge of Python A computer with Python installed (3.8 or higher recommended) An OpenAI API key or Anthropic API key (you'll need to have funds in your account) A GitHub account (for deployment) Step 1: Setting Up Your Development Environment Let's start by creating a dedicated project folder and setting up a virtual environment: # Create project directory mkdir excel_summarizer cd excel_summarizer # Create a virtual environment # For Windows python -m venv venv # For macOS/Linux python3 -m venv venv # Activate the virtual environment # For Windows venv\Scripts\activate # For macOS/Linux source venv/bin/activate Next, open the project in your favorite code editor. If you're using Visual Studio Code: code . Step 2: Installing Required Packages Install the necessary packages: pip install karo streamlit python-dotenv pandas openpyxl chromadb Note: We're installing chromadb as it works with the base agent. Step 3: Creating the Excel Reader Tool Create a new file called excel_tool_reader.py with the following code: import pandas as pd import os import logging from typing import List, Dict, Any, Optional, Union from pydantic import Field, FilePath from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema logger = logging.getLogger(__name__) class ExcelReaderInput(BaseToolInputSchema): """Input schema for the ExcelReader tool.""" file_path: FilePath = Field(..., description="Path to the Excel file.") sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet") max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.") max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.") class ExcelReaderOutput(BaseToolOutputSchema): """Output schema for the ExcelReader tool.""" file_path: str = Field(..., description="The path of file that was read") sheet_name_read: str = Field(..., description="The name of the sheet that was actually read") data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data") row_count: Optional[int] = Field(None, description="The total number of rows read") column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)") class ExcelReader(BaseTool): """Tool to read data from an Excel file.""" name = "excel_reader" description: str = "Reads data from an Excel file and returns a preview of the data." input_schema = ExcelReaderInput output_schema = ExcelReaderOutput def __init__(self, config: Optional[Any] = None): """Initialize the ExcelReader tool.""" logger.info("ExcelReaderTool initialized.") pass def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput: """Reads the specificied Excel file and returns a data preview.""" # Read the specified sheet from the Excel file try: import openpyxl except ImportError: logger.error("openpyxl is not installed. Please install it to read Excel files.") return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A") if not isinstance(input_data, self.input_schema): return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A") file_path_str = str(input_data.file_path) if not os.path.exists(file_path_str): return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A") try: excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl") sheet_names = excel_file.sheet_names sheet_to_read: Union[str, int] = 0 sheet_name_read: str = sheet_names[0] if input_data.sheet_name is not None: if isinstance(input_data.sheet_name, int): if 0 requirements.txt Step 10: Creating a GitHub Repository If you don't have a GitHub account: Go to github.com and sign u

May 6, 2025 - 03:16
 0
Building an Excel Summarizer Using Streamlit, Karo, and LLMs

In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways.

What We're Building

Our application will:

  • Allow users to upload Excel files
  • Read and display a preview of the data
  • Use OpenAI or Anthropic APIs to generate insights about the data
  • Present summaries and key takeaways in a user-friendly interface

Prerequisites

  • Basic knowledge of Python
  • A computer with Python installed (3.8 or higher recommended)
  • An OpenAI API key or Anthropic API key (you'll need to have funds in your account)
  • A GitHub account (for deployment)

Step 1: Setting Up Your Development Environment

Let's start by creating a dedicated project folder and setting up a virtual environment:

# Create project directory
mkdir excel_summarizer
cd excel_summarizer

# Create a virtual environment
# For Windows
python -m venv venv

# For macOS/Linux
python3 -m venv venv

# Activate the virtual environment
# For Windows
venv\Scripts\activate

# For macOS/Linux
source venv/bin/activate

Next, open the project in your favorite code editor. If you're using Visual Studio Code:

code .

Step 2: Installing Required Packages

Install the necessary packages:

pip install karo streamlit python-dotenv pandas openpyxl chromadb

Note: We're installing chromadb as it works with the base agent.

Step 3: Creating the Excel Reader Tool

Create a new file called excel_tool_reader.py with the following code:

import pandas as pd
import os
import logging
from typing import List, Dict, Any, Optional, Union
from pydantic import Field, FilePath

from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema

logger = logging.getLogger(__name__)

class ExcelReaderInput(BaseToolInputSchema):
    """Input schema for the ExcelReader tool."""
    file_path: FilePath = Field(..., description="Path to the Excel file.")
    sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet")
    max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.")
    max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.")

class ExcelReaderOutput(BaseToolOutputSchema):
    """Output schema for the ExcelReader tool."""
    file_path: str = Field(..., description="The path of file that was read")
    sheet_name_read: str = Field(..., description="The name of the sheet that was actually read")
    data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data")
    row_count: Optional[int] = Field(None, description="The total number of rows read")
    column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)")

class ExcelReader(BaseTool):
    """Tool to read data from an Excel file."""
    name = "excel_reader"
    description: str = "Reads data from an Excel file and returns a preview of the data."
    input_schema = ExcelReaderInput
    output_schema = ExcelReaderOutput

    def __init__(self, config: Optional[Any] = None):
        """Initialize the ExcelReader tool."""
        logger.info("ExcelReaderTool initialized.")
        pass

    def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput:
        """Reads the specificied Excel file and returns a data preview."""

        # Read the specified sheet from the Excel file
        try:
            import openpyxl
        except ImportError:
            logger.error("openpyxl is not installed. Please install it to read Excel files.")
            return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A")

        if not isinstance(input_data, self.input_schema):
            return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A")

        file_path_str = str(input_data.file_path)

        if not os.path.exists(file_path_str):
            return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")

        try:
            excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl")
            sheet_names = excel_file.sheet_names
            sheet_to_read: Union[str, int] = 0 
            sheet_name_read: str = sheet_names[0]

            if input_data.sheet_name is not None:
                if isinstance(input_data.sheet_name, int):
                    if 0 <= input_data.sheet_name < len(sheet_names):
                        sheet_to_read = input_data.sheet_name
                        sheet_name_read = sheet_names[sheet_to_read]
                    else:
                        return self.output_schema(success=False, error_message=f"Sheet index out of range: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
                elif isinstance(input_data.sheet_name, str):
                    if input_data.sheet_name in sheet_names:
                        sheet_to_read = input_data.sheet_name
                        sheet_name_read = input_data.sheet_name
                    else:
                        return self.output_schema(success=False, error_message=f"Sheet name not found: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")

            header_df = pd.read_excel(excel_file, sheet_name=sheet_to_read, nrows=0)
            all_columns = header_df.columns.tolist()
            cols_to_use = all_columns[:input_data.max_cols] if input_data.max_cols else all_columns

            df = pd.read_excel(excel_file, sheet_name=sheet_to_read, usecols=cols_to_use, nrows=input_data.max_rows)

            preview_rows = min(len(df), 10)
            data_preview_str = df.head(preview_rows).to_markdown(index=False)

            logger.info(f"Successfully read {len(df)} rows and {len(df.columns)} columns from sheet '{sheet_name_read}' in '{file_path_str}'.")

            return self.output_schema(
                success=True,
                file_path=file_path_str,
                sheet_name_read=sheet_name_read,
                data_preview=data_preview_str,
                row_count=len(df),
                column_names=df.columns.tolist()
            )
        except FileNotFoundError:
            logger.error(f"File not found: {file_path_str}")
            return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
        except Exception as e:
            logger.error(f"Error reading Excel file '{file_path_str}': {e}", exc_info=True)
            return self.output_schema(success=False, error_message=f"Error reading Excel file: {e}", file_path=file_path_str, sheet_name_read="N/A")

Step 4: Creating the Streamlit Application

Create a file called streamlit_app.py:

import streamlit as st
import pandas as pd
import os
import tempfile
from dotenv import load_dotenv
from karo.prompts.system_prompt_builder import SystemPromptBuilder
from rich.console import Console
from pydantic import Field
from typing import List, Dict, Any
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

dotenv_path = os.path.join(os.path.dirname(__file__), '.env')
load_dotenv(dotenv_path=dotenv_path)

from karo.core.base_agent import BaseAgent, BaseAgentConfig
from karo.providers.openai_provider import OpenAIProvider, OpenAIProviderConfig
from karo.providers.anthropic_provider import AnthropicProvider, AnthropicProviderConfig
from karo.schemas.base_schemas import BaseInputSchema, BaseOutputSchema, AgentErrorSchema
from excel_tool_reader import ExcelReaderInput, ExcelReaderOutput, ExcelReader

console = Console()

st.set_page_config(page_title="Excel Reader Tool", page_icon="