The Ultimate Guide to LiteLLM Callbacks: From Zero to Hero for Beginners
You're using LiteLLM to effortlessly manage calls to various Large Language Models (LLMs) like GPT-4, Claude, Gemini, and more. It's fantastic! But have you ever wished you could peek behind the curtain? Maybe log exactly what prompts are being sent, track your spending in real-time, get alerts when things go wrong, or send detailed results to your own database? If so, then LiteLLM Callbacks are the feature you've been looking for. They are LiteLLM's powerful mechanism for letting you hook into the lifecycle of an AI request and execute your own code at critical moments. This guide is designed to take you from a complete beginner ("What even is a callback?") to someone who can confidently implement custom logic for advanced observability and integration. We'll break it down step-by-step, with plenty of examples. Who is this guide for? Beginners: New to LiteLLM or callbacks? We'll start with the basics. Intermediate Users: Want to log costs, errors, or send data to specific tools? We've got you covered. Advanced Users: Need fine-grained control, async handling, or proxy integration? We'll dive deep. Our Goal: By the end of this guide, you'll understand callbacks deeply and feel empowered to integrate LiteLLM with almost anything you need. Part 1: The "Why" and "What" - Callback Foundations Let's start with the fundamentals. 1. Introduction: What Are Callbacks? Imagine you're sending an important package using a delivery service. You'd probably want updates, right? "Your package has been picked up." (Before sending) "Your package is out for delivery." (In progress) "Your package has been successfully delivered!" (Success) "There was an issue delivering your package." (Failure) LiteLLM Callbacks are exactly like these notifications, but for your AI requests. Technically speaking, a callback is a function you provide to LiteLLM that LiteLLM promises to execute ("call back") automatically when a specific event happens during the processing of an AI request (like litellm.completion() or litellm.embedding()). There are two main "flavors" of callbacks in LiteLLM, which we'll explore in detail: Built-in Callbacks: Super easy ways to send data to popular, pre-supported services (like Sentry, Helicone, Slack). Think of these as pre-configured notification settings. Custom Callbacks: Maximum flexibility. You write your own Python code to do literally anything you want when these events occur. This is like setting up your own custom notification system with precise instructions. 2. Why Should I Care? The Awesome Benefits of Callbacks Okay, sounds interesting, but why use them? Callbacks unlock a world of possibilities: ** Crystal-Clear Observability:** See exactly what prompts are being sent, which models are being used, and what responses are coming back. Invaluable for debugging! Example: Track down why a specific user is getting weird responses by logging their exact prompts and the model's replies. ** Precise Cost Tracking:** Know exactly how much each API call costs, almost in real-time. Stop guessing your monthly bill! Example: Log the cost of every successful call to a spreadsheet or database to monitor spending per project or user. ** Performance Monitoring:** Measure how long requests are taking (latency). Identify slow models or network issues. Example: Alert if the average response time for gpt-4 exceeds 5 seconds. ** Proactive Error Handling & Alerting:** Don't wait for users to complain. Get notified immediately when an API call fails (e.g., invalid API key, rate limit exceeded). Example: Send an urgent notification to a Slack channel whenever a production API key fails validation. ** Centralized Data Logging:** Automatically send request details, responses, usage stats, and metadata to your preferred databases, logging platforms, or analytics tools. Example: Store every request, its response, token count, and user ID in a Supabase table for later analysis. ** Custom Workflow Integration:** Trigger specific actions in your application based on the outcome of an LLM call. Example: If a sentiment analysis call returns "positive", automatically add the customer ticket ID to a follow-up queue. Essentially, callbacks give you the visibility and control you need to run AI-powered applications reliably, efficiently, and cost-effectively. Part 2: The Easy Way - Built-in Callback Integrations Let's start with the simplest way to get value from callbacks: using LiteLLM's built-in support for popular third-party services. 3. Getting Started Fast: Using Built-in Callbacks Concept: LiteLLM already knows how to talk to several common tools used for logging, monitoring, and analytics. You just need to tell LiteLLM which tools to notify and when. How it Works: You assign a list of service names (as Python strings) to special variables within the litellm lib

You're using LiteLLM to effortlessly manage calls to various Large Language Models (LLMs) like GPT-4, Claude, Gemini, and more. It's fantastic! But have you ever wished you could peek behind the curtain? Maybe log exactly what prompts are being sent, track your spending in real-time, get alerts when things go wrong, or send detailed results to your own database?
If so, then LiteLLM Callbacks are the feature you've been looking for. They are LiteLLM's powerful mechanism for letting you hook into the lifecycle of an AI request and execute your own code at critical moments.
This guide is designed to take you from a complete beginner ("What even is a callback?") to someone who can confidently implement custom logic for advanced observability and integration. We'll break it down step-by-step, with plenty of examples.
Who is this guide for?
- Beginners: New to LiteLLM or callbacks? We'll start with the basics.
- Intermediate Users: Want to log costs, errors, or send data to specific tools? We've got you covered.
- Advanced Users: Need fine-grained control, async handling, or proxy integration? We'll dive deep.
Our Goal: By the end of this guide, you'll understand callbacks deeply and feel empowered to integrate LiteLLM with almost anything you need.
Part 1: The "Why" and "What" - Callback Foundations
Let's start with the fundamentals.
1. Introduction: What Are Callbacks?
Imagine you're sending an important package using a delivery service. You'd probably want updates, right?
- "Your package has been picked up." (Before sending)
- "Your package is out for delivery." (In progress)
- "Your package has been successfully delivered!" (Success)
- "There was an issue delivering your package." (Failure)
LiteLLM Callbacks are exactly like these notifications, but for your AI requests.
Technically speaking, a callback is a function you provide to LiteLLM that LiteLLM promises to execute ("call back") automatically when a specific event happens during the processing of an AI request (like litellm.completion()
or litellm.embedding()
).
There are two main "flavors" of callbacks in LiteLLM, which we'll explore in detail:
- Built-in Callbacks: Super easy ways to send data to popular, pre-supported services (like Sentry, Helicone, Slack). Think of these as pre-configured notification settings.
- Custom Callbacks: Maximum flexibility. You write your own Python code to do literally anything you want when these events occur. This is like setting up your own custom notification system with precise instructions.
2. Why Should I Care? The Awesome Benefits of Callbacks
Okay, sounds interesting, but why use them? Callbacks unlock a world of possibilities:
- ** Crystal-Clear Observability:** See exactly what prompts are being sent, which models are being used, and what responses are coming back. Invaluable for debugging!
- Example: Track down why a specific user is getting weird responses by logging their exact prompts and the model's replies.
- ** Precise Cost Tracking:** Know exactly how much each API call costs, almost in real-time. Stop guessing your monthly bill!
- Example: Log the cost of every successful call to a spreadsheet or database to monitor spending per project or user.
- ** Performance Monitoring:** Measure how long requests are taking (latency). Identify slow models or network issues.
- Example: Alert if the average response time for
gpt-4
exceeds 5 seconds.
- Example: Alert if the average response time for
- ** Proactive Error Handling & Alerting:** Don't wait for users to complain. Get notified immediately when an API call fails (e.g., invalid API key, rate limit exceeded).
- Example: Send an urgent notification to a Slack channel whenever a production API key fails validation.
- ** Centralized Data Logging:** Automatically send request details, responses, usage stats, and metadata to your preferred databases, logging platforms, or analytics tools.
- Example: Store every request, its response, token count, and user ID in a Supabase table for later analysis.
- ** Custom Workflow Integration:** Trigger specific actions in your application based on the outcome of an LLM call.
- Example: If a sentiment analysis call returns "positive", automatically add the customer ticket ID to a follow-up queue.
Essentially, callbacks give you the visibility and control you need to run AI-powered applications reliably, efficiently, and cost-effectively.
Part 2: The Easy Way - Built-in Callback Integrations
Let's start with the simplest way to get value from callbacks: using LiteLLM's built-in support for popular third-party services.
3. Getting Started Fast: Using Built-in Callbacks
Concept: LiteLLM already knows how to talk to several common tools used for logging, monitoring, and analytics. You just need to tell LiteLLM which tools to notify and when.
How it Works: You assign a list of service names (as Python strings) to special variables within the litellm
library:
-
litellm.input_callback
: Services listed here get notified before the API call is sent. Useful for logging the exact input that might later cause a failure. -
litellm.success_callback
: Services listed here get notified only after an API call completes successfully. Perfect for logging results, usage, and costs. -
litellm.failure_callback
: Services listed here get notified only if an API call fails. Ideal for error tracking and alerting.
Configuration is Key: For these built-in integrations to work, LiteLLM needs the credentials for each service. You must provide these, usually through environment variables. For example, to use Helicone, you need to set the HELICONE_API_KEY
environment variable; for Sentry, you need SENTRY_DSN
; for Slack, SLACK_WEBHOOK_URL
. Check the LiteLLM Observability Docs for the specific environment variables required for each integration.
Example 1: Logging Successful Calls to Helicone
Helicone is great for monitoring LLM usage and costs.
import litellm
import os
# IMPORTANT: Set your Helicone API key as an environment variable
# You might do this in your .env file, shell profile, or deployment settings
# os.environ["HELICONE_API_KEY"] = "YOUR_HELICONE_API_KEY" # Or set it externally
# Tell LiteLLM to notify Helicone on successful calls
litellm.success_callback = ["helicone"]
print("Sending a request...")
try:
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "What is LiteLLM?"}]
)
print("Request successful!")
# LiteLLM automatically sent the request/response details,
# cost, usage, etc., to your Helicone dashboard behind the scenes.
# print(response)
except Exception as e:
print(f"Request failed: {e}")
# You can also clear callbacks if needed
# litellm.success_callback = []
Example 2: Logging Failures to Sentry
Sentry is excellent for tracking errors in applications.
import litellm
import os
# IMPORTANT: Set your Sentry DSN as an environment variable
# os.environ["SENTRY_DSN"] = "YOUR_SENTRY_DSN_URL" # Or set it externally
# Tell LiteLLM to notify Sentry when calls fail
litellm.failure_callback = ["sentry"]
print("Sending a request likely to fail (invalid key)...")
try:
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "This should fail."}],
api_key="bad-key" # Intentionally providing an invalid key
)
print("Request surprisingly succeeded?")
except Exception as e:
print(f"Request failed as expected: {e}")
# LiteLLM automatically sent detailed error information,
# including the input prompt and exception details, to Sentry.
Example 3: Using Multiple Callbacks
You can combine multiple services for different events.
import litellm
import os
# Assume these environment variables are set:
# HELICONE_API_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST
# SENTRY_DSN, SLACK_WEBHOOK_URL
# Log input to Sentry (useful for debugging before failure)
litellm.input_callback = ["sentry"]
# Log success to Helicone AND Langfuse
litellm.success_callback = ["helicone", "langfuse"]
# Log failures to Sentry AND send a Slack alert
litellm.failure_callback = ["sentry", "slack"]
print("Running example with multiple callbacks...")
try:
# Make a successful call
response_ok = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a short joke."}],
user="user-123"
)
print("First request successful!")
# -> Sent to Sentry (input), Helicone (success), Langfuse (success)
# Make a failing call
response_fail = litellm.completion(
model="gpt-4", # Assuming no key configured for this
messages=[{"role": "user", "content": "This will likely fail."}],
user="user-456"
)
except Exception as e:
print(f"Second request failed as expected: {e}")
# -> Sent to Sentry (input), Sentry (failure), Slack (failure)
Supported Services (Partial List):
LiteLLM integrates with many tools out-of-the-box, including:
- Helicone
- Langfuse
- Sentry
- PostHog
- Slack
- Lunary
- Athina
- Tracerloop
- Logfire
- Datadog
- Grafana Loki
- ... and more! Check the official docs for the full, up-to-date list and setup instructions.
Pros & Cons of Built-in Callbacks:
- Pros: Incredibly easy to set up for supported services. No need to write custom logging code. Leverages LiteLLM's well-tested integrations.
- Cons: Limited to the services LiteLLM explicitly supports. Less flexibility if you need custom data formatting, want to log to your own internal system, or need to perform custom logic based on the call results.
Built-in callbacks are fantastic for getting started quickly with popular observability tools. But what if you need more?
Part 3: The Flexible Way - Custom Callbacks
This is where the real power lies. Custom callbacks let you execute any Python code you want during the API call lifecycle. Log to your own database? Calculate custom metrics? Call another API? Trigger complex workflows? Yes, you can!
4. Taking Control: Introduction to Custom Callbacks
Concept: Instead of just providing service names, you provide LiteLLM with your own Python class containing specific methods (functions) that LiteLLM will run at the designated callback events.
The Core Tool: You'll create a class that inherits from litellm.integrations.custom_logger.CustomLogger
. Think of CustomLogger
as a template or blueprint that defines which methods LiteLLM expects you to potentially implement.
Step 1: Create Your Handler Class (The Blueprint)
First, create a new Python file (e.g., my_custom_callbacks.py
). Inside this file, define your class:
# In file: my_custom_callbacks.py
from litellm.integrations.custom_logger import CustomLogger
import litellm # Often needed for utilities like cost calculation
import datetime # Useful for timestamps
# Define your custom instruction set by inheriting from CustomLogger
class MyVeryOwnHandler(CustomLogger):
def __init__(self):
"""Optional: Initialize things your handler might need."""
print("MyVeryOwnHandler is ALIVE!")
self.request_count = 0
# --- We will implement specific methods below ---
Step 2: Implement Key Methods (Your Specific Instructions)
Now, inside your class, you override the methods corresponding to the events you care about. You don't need to implement all of them, only the ones you want to react to.
Here are the most common synchronous methods:
-
log_pre_api_call(self, model, messages, kwargs)
: Runs before LiteLLM sends the request to the actual LLM API. -
log_post_api_call(self, kwargs, response_obj, start_time, end_time)
: Runs immediately after the LLM API responds (or fails), but before the specific success/failure logic. -
log_success_event(self, kwargs, response_obj, start_time, end_time)
: Runs only if the API call was successful. -
log_failure_event(self, kwargs, response_obj, start_time, end_time)
: Runs only if the API call failed. -
log_stream_event(self, kwargs, response_obj, start_time, end_time)
: Runs for each chunk received when usingstream=True
.
Basic Example: Simple Printing Handler
Let's create a handler that just prints messages to the console when events happen.
# In file: my_custom_callbacks.py (Continuing the class)
from litellm.integrations.custom_logger import CustomLogger
import litellm
import datetime
class MyPrintingHandler(CustomLogger):
def __init__(self):
print("MyPrintingHandler Initialized!")
def log_pre_api_call(self, model, messages, kwargs):
"""Called before the actual API call."""
print("\n------------------------------")
print(f"[MyHandler - PRE] About to call model: {model}")
# You could print messages here too, but they can be large
# print(f"[MyHandler - PRE] Prompt Messages: {messages}")
print(f"[MyHandler - PRE] Extra Args (kwargs): {kwargs.get('extra_body', {})}") # Example of getting specific kwarg
def log_post_api_call(self, kwargs, response_obj, start_time, end_time):
"""Called right after the API call returns (success or failure)."""
duration = (end_time - start_time).total_seconds()
print(f"[MyHandler - POST] Call finished. Duration: {duration:.4f} seconds")
def log_success_event(self, kwargs, response_obj, start_time, end_time):
"""Called ONLY on successful API calls."""
print(f"[MyHandler - SUCCESS] API Call was successful!")
# Let's get the model used (might differ from input if routing)
actual_model = response_obj.model # Accessing attribute directly on response object
usage = response_obj.usage # Accessing usage info
print(f"[MyHandler - SUCCESS] Model used: {actual_model}, Usage: {usage}")
# We'll dive deeper into kwargs and response_obj soon!
def log_failure_event(self, kwargs, response_obj, start_time, end_time):
"""Called ONLY on failed API calls."""
print(f"[MyHandler - FAILURE] API Call failed!")
# Extract the error from kwargs if possible
exception = kwargs.get("exception")
if exception:
print(f"[MyHandler - FAILURE] Error Type: {type(exception).__name__}")
print(f"[MyHandler - FAILURE] Error Details: {exception}")
# The response_obj might also contain error info from the provider
# print(f"[MyHandler - FAILURE] Response Object: {response_obj}")
def log_stream_event(self, kwargs, response_obj, start_time, end_time):
"""Called for each chunk during streaming."""
print(f"[MyHandler - STREAM] Received chunk: {response_obj}")
# IMPORTANT: Create an instance (a specific object) of your handler class.
# This is what you'll actually give to LiteLLM.
my_printer_instance = MyPrintingHandler()
Step 3: Register Your Handler (Tell LiteLLM to Use It)
How you register depends on how you're using LiteLLM:
-
A) Direct Library Usage (e.g., in a script):
You assign a list containing your handler instance tolitellm.callbacks
.
import litellm # Import your handler instance from the file where you defined it from my_custom_callbacks import my_printer_instance # Register the handler litellm.callbacks = [my_printer_instance] # Now, make a LiteLLM call print("\nMaking a direct library call...") try: response = litellm.completion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "What time is it?"}], # Example of passing extra data that might appear in kwargs['metadata'] metadata={"customer_id": "cust_abc", "request_source": "script"} ) print("Direct call finished.") # Your MyPrintingHandler methods should have printed messages! except Exception as e: print(f"Direct call failed: {e}") # Your MyPrintingHandler failure methods should have printed messages! # You can have multiple custom handlers! # from another_module import another_handler_instance # litellm.callbacks = [my_printer_instance, another_handler_instance] # To remove callbacks: # litellm.callbacks = []
B) LiteLLM Proxy Usage:
This is handled differently. You don't setlitellm.callbacks
in your Python code. Instead, you tell the proxy about your handler in its configuration file (config.yaml
or similar). We'll cover this in detail in Part 4.
Now you know how to create and register a basic custom handler. But to make it truly useful, you need to understand the data LiteLLM gives you within these methods.
5. Unpacking the Treasure Chest: What Callbacks Receive (kwargs
& response_obj
)
This is arguably the most important section for mastering custom callbacks. The kwargs
dictionary and the response_obj
object are how LiteLLM passes crucial information into your callback methods.
kwargs
(Keyword Arguments Dictionary):
Think of kwargs
as a big dictionary (a bag) filled with context about the specific API call being made. The exact contents can vary slightly depending on which callback method is being called (e.g., exception
only appears on failure) and how LiteLLM was invoked, but here are the most common and useful items:
-
model
(str): The requested model name (e.g.,"gpt-3.5-turbo"
).- Access:
requested_model = kwargs.get("model")
- Access:
-
messages
(list): The list of message dictionaries sent as the prompt.- Access:
prompt_messages = kwargs.get("messages", [])
- Access:
-
user
(str, optional): The user identifier you passed tolitellm.completion(user=...)
.- Access:
user_id = kwargs.get("user")
- Access:
-
litellm_call_id
(str): A unique UUID assigned by LiteLLM to this specific request attempt. Useful for tracing.- Access:
call_id = kwargs.get("litellm_call_id")
- Access:
-
litellm_params
(dict): A dictionary containing additional parameters related to the LiteLLM call setup. Often containsmetadata
.- Access:
params = kwargs.get("litellm_params", {})
- Access:
-
metadata
(dict, insidelitellm_params
): Any extra data you passed in themetadata={...}
argument tolitellm.completion
. Great for passing your own tracking IDs, session info, etc.- Access:
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
- Example:
customer = metadata.get("customer_id")
- Access:
-
response_cost
(float, often in success/failure events): The calculated cost of the completion, if available. LiteLLM calculates this for you.- Access:
cost = kwargs.get("response_cost")
- Access:
-
exception
(Exception object, only in failure events): The Python exception object that was raised.- Access:
error_obj = kwargs.get("exception")
- Access:
-
traceback_exception
(str, only in failure events): A string containing the formatted Python traceback for the exception.- Access:
traceback_str = kwargs.get("traceback_exception")
- Access:
-
proxy_server_request
(dict, only when using LiteLLM Proxy, insidelitellm_params
): Contains details about the original HTTP request made to the proxy itself (URL, headers, body).- Access:
proxy_req = kwargs.get("litellm_params", {}).get("proxy_server_request")
- Example:
request_headers = proxy_req.get("headers", {}) if proxy_req else {}
- Access:
-
model_info
(dict, only when using LiteLLM Proxy, insidelitellm_params
): Information about the model configuration from yourconfig.yaml
(if you setmodel_info
).- Access:
model_cfg = kwargs.get("litellm_params", {}).get("model_info")
- Access:
-
start_time
/end_time
(datetime objects): Passed directly as arguments, not usually insidekwargs
. Used for calculating duration:duration = end_time - start_time
.
response_obj
(Response Object):
This object represents the actual result returned from the underlying LLM API provider (like OpenAI, Anthropic, etc.) after LiteLLM has potentially standardized it slightly.
- On Success: It's typically a
ModelResponse
object (a LiteLLM structure that behaves like a dictionary) containing:-
choices
: A list of choices, usually containing themessage
(withrole
andcontent
). -
usage
: A dictionary withprompt_tokens
,completion_tokens
,total_tokens
. -
model
: The actual model that processed the request (useful if using routing/fallbacks). -
id
: The ID returned by the provider. -
created
: Timestamp from the provider. - ... and potentially other provider-specific fields.
- Access: You can usually access data like dictionary keys or attributes:
content = response_obj['choices'][0]['message']['content']
orusage = response_obj.usage
.
-
- On Failure: The
response_obj
might beNone
, or it could contain an error structure returned by the LLM API provider itself (e.g., an error message from OpenAI). It's less predictable than the success case. Rely more onkwargs['exception']
for the primary error details within LiteLLM. - On Stream: In
log_stream_event
,response_obj
is typically a chunk of the response (e.g., aStreamingChunk
object), often containing adelta
field with the new text fragment.
Example: Accessing Data in a Success Callback
# Inside your handler class...
import json # Useful for pretty printing
def log_success_event(self, kwargs, response_obj, start_time, end_time):
print("\n--- Detailed Success Log ---")
try:
# Extract from kwargs
request_model = kwargs.get("model", "N/A")
user = kwargs.get("user", "N/A")
call_id = kwargs.get("litellm_call_id", "N/A")
cost = kwargs.get("response_cost", 0.0)
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
# Extract from response_obj
actual_model = getattr(response_obj, "model", "N/A") # Safe attribute access
usage = getattr(response_obj, "usage", {}) # Safe attribute access
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
response_id = getattr(response_obj, "id", "N/A")
# Get first choice content safely
response_text = "N/A"
if hasattr(response_obj, "choices") and response_obj.choices:
message = getattr(response_obj.choices[0], "message", None)
if message:
response_text = getattr(message, "content", "N/A")
# Calculate duration
duration_ms = (end_time - start_time).total_seconds() * 1000
print(f"Call ID: {call_id}")
print(f"User: {user}")
print(f"Requested Model: {request_model}")
print(f"Actual Model Used: {actual_model}")
print(f"Duration: {duration_ms:.2f} ms")
print(f"Usage: Prompt={prompt_tokens}, Completion={completion_tokens}, Total={usage.get('total_tokens', 0)}")
print(f"Cost: ${cost:.6f}")
print(f"Response ID: {response_id}")
print(f"Response Text (first choice): {response_text[:100]}...") # Print snippet
print(f"Metadata: {json.dumps(metadata)}") # Pretty print metadata
# --- In a real app, you'd send this data somewhere ---
# log_to_database(call_id=call_id, user=user, cost=cost, ...)
# send_to_analytics(event="llm_success", properties={...})
except Exception as e:
print(f"[HANDLER - SUCCESS/ERROR] Failed to process success event: {e}")
Understanding kwargs
and response_obj
is the key to unlocking the full potential of custom callbacks. Spend time exploring what data is available in different scenarios!
6. Going Async: Asynchronous Custom Callbacks
Modern Python applications often use asyncio
for better performance, especially when dealing with network calls (like LLM requests). LiteLLM fully supports this, both for making requests (litellm.acompletion
) and within callbacks. The LiteLLM Proxy itself is heavily asynchronous.
Why Async Callbacks?
If your callback needs to perform network I/O (like writing to a remote database, calling another API, sending data to an observability platform), using an async
callback prevents it from blocking the main application thread. This is crucial for maintaining responsiveness, especially under load.
How?
You simply define the callback methods in your handler class using async def
instead of just def
. LiteLLM is smart enough to detect these async
versions and will await
them correctly when called from an asynchronous context.
The primary async methods you'll likely use are:
-
async_log_success_event(self, kwargs, response_obj, start_time, end_time)
-
async_log_failure_event(self, kwargs, response_obj, start_time, end_time)
-
async_log_pre_api_call(self, model, messages, kwargs)
-
async_log_post_api_call(self, kwargs, response_obj, start_time, end_time)
-
async_log_stream_event(self, kwargs, response_obj, start_time, end_time)
Example: Async Success Logging (Similar to Docs)
This example demonstrates an async success logger, which is typical for proxy usage. Notice the async def
and the conceptual await log_to_my_service(...)
.
# In file: my_custom_callbacks.py (Continuing the class)
import asyncio # Needed for async operations like sleep
async def log_to_my_service(data):
"""Placeholder for your actual async logging function."""
print(f"[SERVICE LOG] Simulating async network call to log data...")
await asyncio.sleep(0.05) # Simulate network delay
# In reality: await http_client.post("https://my-logging-service.com/api", json=data)
# Or: await database_connection.execute("INSERT INTO logs ...", data)
print(f"[SERVICE LOG] Finished logging: {data.get('call_id')}")
class MyAsyncHandler(CustomLogger): # Could be the same class or a new one
# ... include other methods if needed (sync or async) ...
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
print("\n[MyAsyncHandler - ASYNC SUCCESS] Call successful!")
try:
# --- Extract data (same logic as the sync example above) ---
call_id = kwargs.get("litellm_call_id", "N/A")
user = kwargs.get("user", "N/A")
cost = kwargs.get("response_cost", 0.0)
usage = getattr(response_obj, "usage", {})
actual_model = getattr(response_obj, "model", "N/A")
duration = (end_time - start_time).total_seconds()
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
log_data = {
"call_id": call_id,
"user": user,
"model": actual_model,
"cost": cost,
"duration_sec": duration,
"usage": dict(usage), # Convert usage object to dict for logging
"status": "success",
"metadata": metadata,
"timestamp": end_time.isoformat()
}
# --- Perform the async logging operation ---
await log_to_my_service(log_data)
except Exception as e:
print(f"[HANDLER - ASYNC SUCCESS/ERROR] Failed to process async success event: {e}")
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
print("\n[MyAsyncHandler - ASYNC FAILURE] Call failed!")
try:
# --- Extract data ---
call_id = kwargs.get("litellm_call_id", "N/A")
user = kwargs.get("user", "N/A")
request_model = kwargs.get("model", "N/A")
exception_obj = kwargs.get("exception")
exception_type = type(exception_obj).__name__ if exception_obj else "N/A"
error_message = str(exception_obj) if exception_obj else "N/A"
duration = (end_time - start_time).total_seconds()
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
log_data = {
"call_id": call_id,
"user": user,
"model": request_model, # Actual model might not be known on failure
"cost": kwargs.get("response_cost", 0.0), # May be 0 on failure
"duration_sec": duration,
"status": "failure",
"error_type": exception_type,
"error_message": error_message,
"metadata": metadata,
"timestamp": end_time.isoformat()
}
# --- Perform async logging ---
await log_to_my_service(log_data) # Log failures too!
except Exception as e:
print(f"[HANDLER - ASYNC FAILURE/ERROR] Failed to process async failure event: {e}")
# Create an instance if this is a separate class
my_async_handler_instance = MyAsyncHandler()
When are Sync vs. Async methods called?
- If you call
litellm.completion
(sync), LiteLLM will try to call your sync callback methods (log_success_event
, etc.). - If you call
litellm.acompletion
(async) or use the LiteLLM Proxy (which is async internally), LiteLLM will prefer to call your async callback methods (async_log_success_event
, etc.) if you've defined them. If you only defined the sync versions, LiteLLM might still call them, but this could potentially block the async event loop, so defining async versions is highly recommended for async usage.
Rule of thumb: If using the LiteLLM Proxy or litellm.acompletion
, implement the async def
versions of the callback methods you need.
Part 4: Putting It Together - Lifecycle & Implementation
Now let's clarify exactly when these callbacks fire and how to implement them in the two main scenarios: direct library use vs. the proxy.
7. The Callback Lifecycle: When Does What Fire?
Understanding the order of events is crucial for placing your logic correctly. Here's the typical flow for a non-streaming call:
- Initiation: Your code calls
litellm.completion(...)
,litellm.acompletion(...)
, or sends an HTTP request to a LiteLLM Proxy endpoint. - [Custom] Pre-API Call: If you registered a custom handler, its
log_pre_api_call
(orasync_log_pre_api_call
) method runs. - [Built-in] Input Callback: If
litellm.input_callback
is set (e.g.,["sentry"]
), LiteLLM sends data to those services. - ---> Network Call: LiteLLM sends the actual request to the target LLM provider (e.g., OpenAI API).
- <--- Network Response: LiteLLM receives the response (or an error) from the LLM provider.
- [Custom] Post-API Call: Your custom handler's
log_post_api_call
(orasync_log_post_api_call
) runs. This happens regardless of success or failure. - Outcome Determination: LiteLLM checks if the call succeeded or failed.
- If Call Succeeded:
- a. [Custom] Success Event: Your handler's
log_success_event
(orasync_log_success_event
) runs. LiteLLM calculatesresponse_cost
around this time. - b. [Built-in] Success Callback: If
litellm.success_callback
is set (e.g.,["helicone"]
), LiteLLM sends data to those services.
- a. [Custom] Success Event: Your handler's
- If Call Failed:
- a. [Custom] Failure Event: Your handler's
log_failure_event
(orasync_log_failure_event
) runs. Theexception
andtraceback_exception
are populated inkwargs
. - b. [Built-in] Failure Callback: If
litellm.failure_callback
is set (e.g.,["sentry"]
), LiteLLM sends data to those services.
- a. [Custom] Failure Event: Your handler's
-
Return/Raise: LiteLLM returns the successful
ModelResponse
object to your code or raises the appropriate Python exception.
Streaming Calls (stream=True
):
The main difference is step 8a (or 9a if the stream setup fails):
- Instead of
log_success_event
firing once at the end, the [Custom] Stream Event (log_stream_event
orasync_log_stream_event
) fires multiple times, once for each data chunk received during step 5. - The final success/failure event might still fire after the stream concludes or if an error terminates the stream. Built-in callbacks for success/failure usually trigger after the stream finishes successfully or fails.
8. Implementation Deep Dive: Proxy vs. Direct Library
Let's solidify how to activate your callbacks in these two common contexts.
Scenario 1: Direct Library Usage
- Context: You're importing
litellm
directly into your Python script, web application (like Flask/Django), or other program. -
How to Register:
- Define your custom handler class (e.g.,
MyCustomHandler
) in a Python file (e.g.,my_handlers.py
). - Create an instance of your handler:
my_instance = MyCustomHandler()
- Import the instance into the script where you make
litellm
calls. -
Assign a list containing your instance(s) to
litellm.callbacks
:
from my_handlers import my_instance import litellm litellm.callbacks = [my_instance] # Optional: Add built-in callbacks too litellm.success_callback = ["helicone"] # Make your calls response = litellm.completion(...) # Sync - triggers sync methods # or # async_response = await litellm.acompletion(...) # Async - triggers async methods preferably
- Define your custom handler class (e.g.,
Pros: Simple setup for self-contained applications.
Cons: Configuration is tied to your code execution; less centralized if you have multiple scripts or entry points.
Scenario 2: LiteLLM Proxy Usage
- Context: You are running the dedicated LiteLLM Proxy server, often in Docker, and your application interacts with the proxy via HTTP requests. This is common for managing keys, routing, and providing a unified API endpoint.
-
How to Register: Do NOT set
litellm.callbacks
in code. Configuration happens entirely within the proxy'sconfig.yaml
file.-
Define your custom handler class (e.g.,
MyProxyHandler
) and create an instance of it at the module level in a Python file (e.g.,proxy_callbacks.py
). Make sure this file is accessible by the proxy (e.g., mounted into the Docker container). Crucially, useasync def
methods in your handler as the proxy operates asynchronously.
# In file: proxy_callbacks.py (accessible to the proxy) from litellm.integrations.custom_logger import CustomLogger # ... other imports ... class MyProxyHandler(CustomLogger): # Use async def methods here! async def async_log_success_event(self, kwargs, ...): print("[PROXY HANDLER]: Success!") # ... your async logging logic ... async def async_log_failure_event(self, kwargs, ...): print("[PROXY HANDLER]: Failure!") # ... your async failure logging ... # ... other async methods if needed ... # ---> Create the instance AT THE MODULE LEVEL <--- proxy_logger_instance = MyProxyHandler()
-
2. Edit your proxy's `config.yaml` file and add/modify the `litellm_settings` section:
```yaml
# In file: config.yaml
model_list:
# ... your model definitions ...
litellm_settings:
# Option 1: Single custom callback
callbacks: proxy_callbacks.proxy_logger_instance
# Option 2: List for multiple custom callbacks (RECOMMENDED for clarity)
# callbacks: [proxy_callbacks.proxy_logger_instance, another_module.another_instance]
# You can ALSO set built-in callbacks here! They work alongside custom ones.
# success_callback: ["helicone"]
# failure_callback: ["sentry"]
```
* **Explanation of `proxy_callbacks.proxy_logger_instance`:**
* `proxy_callbacks`: Tells the proxy to look for the Python file named `proxy_callbacks.py`.
* `proxy_logger_instance`: Tells the proxy to find the *variable* named `proxy_logger_instance` within that file (which holds our handler object).
3. Start (or restart) the LiteLLM Proxy, ensuring it loads this `config.yaml` and can access your `proxy_callbacks.py` file.
```bash
litellm --config /path/to/your/config.yaml --logs # Add --logs to see output
```
4. Make HTTP requests to your proxy endpoints (e.g., `/chat/completions`). Check the proxy's logs - you should see the output from your `MyProxyHandler` methods!
```bash
curl -X POST http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Test proxy callback"}],
"user": "proxy-test-user"
}'
```
*(Expected Proxy Log Output based on example above):*
```
[PROXY HANDLER]: Success!
```
- Pros: Centralized configuration. Keeps callback logic separate from application code. Ideal for microservices or when multiple apps use the same LLM gateway. Leverages the proxy's async nature effectively.
- Cons: Requires managing the proxy setup and ensuring file accessibility (especially with Docker volumes). Debugging might involve looking at proxy logs rather than direct application output.
Part 5: Practical Recipes & Advanced Techniques
Theory is great, but let's see some practical examples!
9. Common Callback Recipes
Here are snippets you can adapt for common tasks within your custom callback methods (primarily shown within async methods, suitable for proxy):
Recipe 1: Logging to a Standard Python Log File
# Add imports at the top of your callback file
import logging
import sys
# Configure logging ONCE (e.g., outside the class or in __init__)
# This setup logs to a file and also the console
log_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
log_handler_file = logging.FileHandler("litellm_callbacks.log")
log_handler_file.setFormatter(log_formatter)
log_handler_console = logging.StreamHandler(sys.stdout)
log_handler_console.setFormatter(log_formatter)
logger = logging.getLogger("LiteLLM_Callbacks")
logger.setLevel(logging.INFO) # Set desired log level
logger.addHandler(log_handler_file)
logger.addHandler(log_handler_console)
logger.propagate = False # Prevent duplicate logging if root logger is configured
# Inside your handler class...
class MyFileLoggingHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
try:
call_id = kwargs.get("litellm_call_id")
model = getattr(response_obj, "model", "N/A")
cost = kwargs.get("response_cost", 0.0)
logger.info(f"Success - ID: {call_id}, Model: {model}, Cost: {cost:.6f}")
except Exception as e:
logger.error(f"Callback Success Logging Error: {e}", exc_info=True) # Log traceback
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
try:
call_id = kwargs.get("litellm_call_id")
model = kwargs.get("model", "N/A")
error = kwargs.get("exception", "Unknown Error")
logger.error(f"Failure - ID: {call_id}, Model: {model}, Error: {error}")
except Exception as e:
logger.error(f"Callback Failure Logging Error: {e}", exc_info=True)
# Remember to create an instance: file_logger_instance = MyFileLoggingHandler()
# And register it in litellm.callbacks or config.yaml
Recipe 2: Reliable Cost Logging
LiteLLM calculates cost and usually adds it to kwargs['response_cost']
in success/failure events.
# Inside your handler class...
class CostTrackingHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
cost = kwargs.get("response_cost")
call_id = kwargs.get("litellm_call_id")
if cost is not None:
print(f"[COST_HANDLER] Success Call {call_id}: Cost = ${cost:.8f}")
# log_cost_to_db(call_id, cost)
else:
print(f"[COST_HANDLER] Success Call {call_id}: Cost info not found in kwargs.")
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
cost = kwargs.get("response_cost") # Cost might be 0 or None on failure
call_id = kwargs.get("litellm_call_id")
if cost is not None:
print(f"[COST_HANDLER] Failed Call {call_id}: Cost = ${cost:.8f}")
# log_cost_to_db(call_id, cost) # You might still log cost on failure
else:
print(f"[COST_HANDLER] Failed Call {call_id}: Cost info not found/applicable.")
# Instance: cost_tracker = CostTrackingHandler()
Recipe 3: Logging to a Database (Conceptual)
Requires an async database library (like asyncpg
for PostgreSQL, motor
for MongoDB).
# Needs library like 'asyncpg' installed
# import asyncpg
# POOL = None # Global pool initialized elsewhere
# async def initialize_db_pool():
# global POOL
# POOL = await asyncpg.create_pool(user='user', password='pw', database='db', host='host')
# Inside your handler class...
class DatabaseLoggingHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
# global POOL
# if POOL is None: await initialize_db_pool() # Ensure pool is initialized
try:
# Extract all the data you need (call_id, user, cost, usage, model, etc.)
log_entry = {
"call_id": kwargs.get("litellm_call_id"),
"user_id": kwargs.get("user"),
"model": getattr(response_obj, "model", kwargs.get("model")),
"prompt_tokens": getattr(response_obj, "usage", {}).get("prompt_tokens"),
# ... other fields
"status": "success",
"response_obj_json": response_obj.json() # Store full response if needed
}
print(f"[DB_HANDLER] Logging success: {log_entry['call_id']}")
# --- Replace with your actual DB insert ---
# async with POOL.acquire() as connection:
# await connection.execute("""
# INSERT INTO litellm_logs (call_id, user_id, ...)
# VALUES ($1, $2, ...)
# """, log_entry['call_id'], log_entry['user_id'], ...)
await asyncio.sleep(0.1) # Simulate DB call
except Exception as e:
print(f"[DB_HANDLER] ERROR logging success to DB: {e}")
# Maybe log this critical failure elsewhere (e.g., file logger)
# Implement async_log_failure_event similarly
# Instance: db_logger = DatabaseLoggingHandler()
Note: Managing DB connections within callbacks requires care (connection pooling, initialization).
Recipe 4: Simple Failure Alerting (Conceptual)
Requires an HTTP library like aiohttp
.
# Needs library like 'aiohttp' installed
# import aiohttp
# SLACK_WEBHOOK_URL = os.environ.get("SLACK_WEBHOOK_URL") # Get from env
# async def send_slack_alert(message):
# if not SLACK_WEBHOOK_URL:
# print("[SLACK ALERT] ERROR: SLACK_WEBHOOK_URL not set.")
# return
# payload = {"text": f":rotating_light: LiteLLM Failure Alert :rotating_light:\n{message}"}
# try:
# async with aiohttp.ClientSession() as session:
# async with session.post(SLACK_WEBHOOK_URL, json=payload) as response:
# if response.status != 200:
# print(f"[SLACK ALERT] Error sending alert: {response.status} {await response.text()}")
# except Exception as e:
# print(f"[SLACK ALERT] Exception sending alert: {e}")
# Inside your handler class...
class AlertingHandler(CustomLogger):
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
try:
call_id = kwargs.get("litellm_call_id", "N/A")
model = kwargs.get("model", "N/A")
user = kwargs.get("user", "N/A")
error = kwargs.get("exception", "Unknown Error")
trace = kwargs.get("traceback_exception", "N/A")
alert_message = (
f"LiteLLM Call Failed!\n"
f"*Call ID:* {call_id}\n"
f"*Model:* {model}\n"
f"*User:* {user}\n"
f"*Error:* `{error}`\n"
# f"Traceback:\n```
{% endraw %}
\n{trace}\n
{% raw %}
```" # Traceback can be very long for Slack
)
print(f"[ALERT_HANDLER] Sending failure alert for {call_id}")
# --- Call the async alert function ---
# await send_slack_alert(alert_message)
await asyncio.sleep(0.1) # Simulate network call
except Exception as e:
print(f"[ALERT_HANDLER] ERROR processing failure alert: {e}")
# Instance: alerter = AlertingHandler()
Recipe 5 & 6: Accessing Proxy Request & Model Info
These are specific to proxy usage and rely on data LiteLLM injects into kwargs['litellm_params']
.
# Inside your handler class (for proxy usage)...
class ProxyInfoHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
litellm_params = kwargs.get("litellm_params", {})
# Get original HTTP request details sent to the proxy
proxy_request = litellm_params.get("proxy_server_request")
if proxy_request:
print(f"[PROXY_INFO] Original Request URL: {proxy_request.get('url')}")
print(f"[PROXY_INFO] Original Request Headers: {proxy_request.get('headers')}")
# print(f"[PROXY_INFO] Original Request Body: {proxy_request.get('body')}") # Careful, can be large
# Get model_info defined in config.yaml
model_info = litellm_params.get("model_info")
if model_info:
print(f"[PROXY_INFO] Configured Model Info: {model_info}")
# ... rest of your success logging ...
# Instance: proxy_info_logger = ProxyInfoHandler()
10. Advanced Considerations
-
Error Handling Within Callbacks: Your callback code can fail! Wrap the entire content of your callback methods in a
try...except
block. Log any errors occurring inside the callback itself (perhaps using a simple file logger) so that a failing callback doesn't crash the main LiteLLM request processing.
async def async_log_success_event(self, kwargs, ...): try: # --- Your Core Callback Logic --- # await log_to_db(...) pass except Exception as callback_error: print(f"CRITICAL: Callback failed! Error: {callback_error}", file=sys.stderr) # Maybe log callback_error to a separate, ultra-reliable log file
-
Performance: Callbacks add overhead. Keep the code inside them efficient.
- Sync Callbacks: Avoid any blocking I/O (network calls, slow disk access) as it will stall the main thread.
- Async Callbacks: Prefer non-blocking I/O (
asyncio.sleep
,aiohttp
,asyncpg
, etc.). Heavy CPU-bound work should ideally be offloaded (e.g., send data to a background task queue like Celery/RQ).
-
Combining Callbacks: Yes, you can use both! LiteLLM will execute built-in callbacks AND your custom callback handlers for the same event if both are configured.
# Direct library usage: litellm.success_callback = ["helicone"] litellm.callbacks = [MyCustomHandler(), CostTrackingHandler()] # Proxy config.yaml: # litellm_settings: # success_callback: ["helicone"] # callbacks: [my_module.my_handler_instance, cost_module.cost_tracker]
Callback Dependencies: If your custom callback imports libraries (e.g.,
requests
,asyncpg
,aiohttp
), make sure those libraries are installed in the Python environment where LiteLLM (or the LiteLLM Proxy) is running. This is especially important for Dockerized proxy setups – add them to yourrequirements.txt
!Stateful Callbacks: Your handler class instance can hold state (like the
request_count
in our first example:self.request_count += 1
). This allows aggregating data across multiple calls within the same process. Be cautious: With the proxy, multiple worker processes might exist, each with its own instance of your handler, so state might not be globally shared unless you use external mechanisms (like Redis, databases). Keep it simple unless you have a specific need for state.
Part 6: Troubleshooting & Best Practices
You've come this far! Let's wrap up with common issues and key takeaways.
11. Troubleshooting Common Issues
- Callback Not Firing:
- Check Registration: Did you set
litellm.callbacks = [my_instance]
(direct) or correctly configurecallbacks: module.instance
inconfig.yaml
(proxy)? - Check Method Names: Do they exactly match (
log_success_event
,async_log_success_event
, etc.)? Case matters! - Check Instance Creation: Did you create an instance of your handler class (
my_instance = MyHandler()
)? - Proxy: Is the Python file (
module.py
) accessible to the proxy? Is the instance variable name (instance
) correct? Restart the proxy after config changes. Check proxy logs (litellm --config ... --logs
).
- Check Registration: Did you set
- Proxy:
ModuleNotFoundError
orAttributeError
:- The proxy cannot find your Python file (
module.py
) or the instance variable (instance
) inside it. - Check the file path relative to where the proxy is run.
- Check
PYTHONPATH
if necessary. - Verify the spelling of the module and instance name in
config.yaml
. - Ensure the instance is created at the top level of the module file, not inside a function.
- The proxy cannot find your Python file (
- Data Missing in
kwargs
orresponse_obj
:- Print the entire
kwargs
dict andresponse_obj
inside your callback to see exactly what's available for that specific event type. Remember,exception
is only in failure events,response_cost
might be missing sometimes. - Check your LiteLLM version; newer versions might add more context.
- Print the entire
- Callback Seems Slow / Blocking Application:
- Review code inside the callback. Are you doing network I/O in a sync callback? Are you doing heavy computation?
- Switch to
async
callbacks if using the proxy oracompletion
. - Offload heavy work to background tasks if necessary.
- Errors Inside the Callback:
- Implement
try...except
within your callback methods (See Recipe/Advanced Considerations) and log these internal errors somewhere reliable.
- Implement
12. Best Practices Summary
- Start Simple: Use Built-in callbacks if they meet your needs.
- Be Specific: Only implement the custom callback methods (
log_success_event
, etc.) you actually require. - Prefer Async (Especially for Proxy): If doing I/O or using the proxy/
acompletion
, useasync def
methods. - Handle Errors Gracefully: Always wrap your callback logic in
try...except
. A failing callback shouldn't break everything else. - Keep it Lean & Fast: Avoid slow or complex operations directly in callbacks. Offload if needed.
- Use Standard Logging: Leverage Python's
logging
module for robust file/console logging within callbacks. - Know Your Data: Understand what's available in
kwargs
andresponse_obj
for each event type. Print them out to explore! - Isolate Dependencies: Be mindful of libraries your callbacks import, especially for proxy deployments.
- Configure Appropriately: Use
litellm.callbacks = [...]
for direct use,config.yaml
for the proxy.
LiteLLM Callbacks are an incredibly powerful feature for gaining deep insights and control over your LLM interactions. By understanding the two main approaches (built-in vs. custom), the event lifecycle, the data provided via kwargs
and response_obj
, and the nuances of sync vs. async execution, you can now build robust logging, monitoring, cost tracking, and custom integrations tailored precisely to your needs.
Don't be afraid to experiment! Start with simple printing handlers, explore the data available, and gradually build up complexity. Happy coding!