Building a serverless GenAI API with FastAPI, AWS, and CircleCI
The advancement of AI has empowered businesses to incorporate intelligent automation into their applications. A serverless Generative AI (GenAI) API enables developers to harness cutting-edge AI models without the burden of infrastructure management. This guide walks you through building a scalable and cost-effective GenAI API using FastAPI, a high-performance Python framework with built-in async support and seamless AWS integration. By deploying FastAPI on AWS Lambda with AWS API Gateway, you can create a fully managed, pay-per-use architecture that eliminates server maintenance. To simplify development and deployment, you will set up a Continuous Integration and Continuous Deployment (CI/CD) pipeline with CircleCI, automating testing, building, and deployment. With CircleCI’s GitHub integration, you’ll achieve continuous delivery, reducing errors and accelerating development cycles. This combination of FastAPI, AWS Lambda, and CircleCI ensures a robust, scalable, and efficient GenAI API ready for real-world applications. You can check out the complete source code on GitHub, but this tutorial will guide you to build it from scratch. Prerequisites Before diving into the process of building a serverless GenAI API, there are several prerequisites you need to have in place. Here is a breakdown of what you will need: AWS Account: You will need an active AWS account to deploy the serverless application using AWS services like Lambda and API Gateway. AWS CLI: To install and configure the AWS CLI, follow the instructions in the AWS CLI documentation. Once installed, configure it with aws configure and provide your AWS access key, secret key, region, and output format. Basic Understanding of RESTful APIs, FastAPI, and GenAI Models: This project assumes a basic understanding of RESTful APIs, FastAPI, and GenAI models. REST APIs enable communication between clients (like web or mobile apps) and servers, while FastAPI is a fast, modern Python framework for building APIs with automatic documentation generation. GenAI models, such as OpenAI’s GPT, generate human-like text and other outputs, and in this project, you will integrate OpenAI into the API to provide responses to user queries. GitHub and CircleCI Accounts: You will need a GitHub account to host your project’s repository and a CircleCI account to automate testing and deployment through CI/CD. OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website. Setting Up the FastAPI GenAI Server FastAPI is a modern, high-performance web framework for building APIs with Python. It is particularly well-suited for LLM-based APIs due to its speed, simplicity, and support for asynchronous operations, which enable handling multiple requests efficiently. For this project, you will integrate OpenAI’s GPT-4o-mini model via its API to generate AI-driven responses with minimal setup. Installing Dependencies and GenAI Libraries First, clone the repository containing the project code. git clone https://github.com/CIRCLECI-GWP/genai-aws-circleci.git cd genai-aws-circleci Then, install the package manager uv from astral, a fast Python package manager, instead of pip. Written in Rust, uv is chosen for its speed, efficient dependency resolution, and built-in support for managing virtual environments. You can install it using the following command. curl -LsSf https://astral.sh/uv/install.sh | sh Once you have installed uv, run the following commands to install dependencies and activate the virtual environment. uv sync source .venv/bin/activate The uv sync command will: Install the dependencies defined in pyproject.toml. Automatically create a virtual environment (.venv). Finally, creeate .env file in the root directory of your respository and add your OPENAI_API_KEY to the file. OPENAI_API_KEY=your-openai-key Define Endpoints With dependencies installed, you can now define the FastAPI endpoints to interact with the GPT-4o-mini model. This implementation, found in main.py, does not yet include AWS integration, which you will cover in the next section. Code Breakdown: PromptRequest (Pydantic Model): Defines the expected structure of incoming requests. It ensures that each request contains a prompt string. get_openai_api_key(): Retrieves the OpenAI API key from the environment variables file. If the key is missing, it raises an HTTPException to prevent unauthorized API calls. get_openai_client(): Uses get_openai_api_key() to fetch the API key and initialize the OpenAI client. If initialization fails, an exception is raised. Root Endpoint (/): A simple health check that confirms the API is running. Generate Endpoint (/generate): Accepts a POST request containing a prompt, passes it to OpenAI’s GPT-4o-mini, and returns the generated response. It depends on get_openai_client() to ensure a valid API connection. OpenAI AP

The advancement of AI has empowered businesses to incorporate intelligent automation into their applications. A serverless Generative AI (GenAI) API enables developers to harness cutting-edge AI models without the burden of infrastructure management. This guide walks you through building a scalable and cost-effective GenAI API using FastAPI, a high-performance Python framework with built-in async support and seamless AWS integration. By deploying FastAPI on AWS Lambda with AWS API Gateway, you can create a fully managed, pay-per-use architecture that eliminates server maintenance.
To simplify development and deployment, you will set up a Continuous Integration and Continuous Deployment (CI/CD) pipeline with CircleCI, automating testing, building, and deployment. With CircleCI’s GitHub integration, you’ll achieve continuous delivery, reducing errors and accelerating development cycles. This combination of FastAPI, AWS Lambda, and CircleCI ensures a robust, scalable, and efficient GenAI API ready for real-world applications.
You can check out the complete source code on GitHub, but this tutorial will guide you to build it from scratch.
Prerequisites
Before diving into the process of building a serverless GenAI API, there are several prerequisites you need to have in place. Here is a breakdown of what you will need:
- AWS Account: You will need an active AWS account to deploy the serverless application using AWS services like Lambda and API Gateway.
- AWS CLI: To install and configure the AWS CLI, follow the instructions in the AWS CLI documentation. Once installed, configure it with aws configure and provide your AWS access key, secret key, region, and output format.
- Basic Understanding of RESTful APIs, FastAPI, and GenAI Models: This project assumes a basic understanding of RESTful APIs, FastAPI, and GenAI models. REST APIs enable communication between clients (like web or mobile apps) and servers, while FastAPI is a fast, modern Python framework for building APIs with automatic documentation generation. GenAI models, such as OpenAI’s GPT, generate human-like text and other outputs, and in this project, you will integrate OpenAI into the API to provide responses to user queries.
- GitHub and CircleCI Accounts: You will need a GitHub account to host your project’s repository and a CircleCI account to automate testing and deployment through CI/CD.
- OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website.
Setting Up the FastAPI GenAI Server
FastAPI is a modern, high-performance web framework for building APIs with Python. It is particularly well-suited for LLM-based APIs due to its speed, simplicity, and support for asynchronous operations, which enable handling multiple requests efficiently. For this project, you will integrate OpenAI’s GPT-4o-mini model via its API to generate AI-driven responses with minimal setup.
Installing Dependencies and GenAI Libraries
First, clone the repository containing the project code.
git clone https://github.com/CIRCLECI-GWP/genai-aws-circleci.git
cd genai-aws-circleci
Then, install the package manager uv from astral, a fast Python package manager, instead of pip
. Written in Rust, uv
is chosen for its speed, efficient dependency resolution, and built-in support for managing virtual environments. You can install it using the following command.
curl -LsSf https://astral.sh/uv/install.sh | sh
Once you have installed uv
, run the following commands to install dependencies and activate the virtual environment.
uv sync
source .venv/bin/activate
The uv sync
command will:
- Install the dependencies defined in
pyproject.toml
. - Automatically create a virtual environment (
.venv
).
Finally, creeate .env
file in the root directory of your respository and add your OPENAI_API_KEY
to the file.
OPENAI_API_KEY=your-openai-key
Define Endpoints
With dependencies installed, you can now define the FastAPI endpoints to interact with the GPT-4o-mini model. This implementation, found in main.py
, does not yet include AWS integration, which you will cover in the next section.
Code Breakdown:
-
PromptRequest (Pydantic Model): Defines the expected structure of incoming requests. It ensures that each request contains a
prompt
string. -
get_openai_api_key(): Retrieves the OpenAI API key from the environment variables file. If the key is missing, it raises an
HTTPException
to prevent unauthorized API calls. -
get_openai_client(): Uses
get_openai_api_key()
to fetch the API key and initialize the OpenAI client. If initialization fails, an exception is raised. -
Root Endpoint (
/
): A simple health check that confirms the API is running. -
Generate Endpoint (
/generate
): Accepts a POST request containing a prompt, passes it to OpenAI’s GPT-4o-mini, and returns the generated response. It depends onget_openai_client()
to ensure a valid API connection. -
OpenAI API Call: Uses
chat.completions.create()
to send the user’s prompt to OpenAI and returns the generated response.
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv()
# Initialize FastAPI
app = FastAPI()
# Pydantic model to define expected structure of request
class PromptRequest(BaseModel):
"""Model for request validation."""
prompt: str
def get_openai_api_key():
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")
return api_key
def get_openai_client():
try:
api_key = get_openai_api_key()
return OpenAI(api_key=api_key)
except HTTPException as e:
raise HTTPException(status_code=500, detail="Failed to initialize OpenAI client: " + str(e.detail))
@app.get("/")
async def root():
"""Root endpoint to confirm API is running."""
return {"message": "Welcome to the GenAI API"}
@app.post("/generate")
async def generate_text(request: PromptRequest, client: OpenAI = Depends(get_openai_client)):
if not client:
raise HTTPException(status_code=500, detail="OpenAI API client not initialized.")
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": request.prompt}],
max_tokens=200
)
if not response.choices:
raise ValueError("No response received from OpenAI API.")
return {"response": response.choices[0].message.content}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Run the app with uvicorn
if __name__ == "__main__":
import uvicorn
uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True)
Running the FastAPI application
To run the FastAPI application, execute the command below:
uv run main.py
The command will start the FastAPI server locally at http://127.0.0.1:8000
, and the --reload
option allows for hot-reloading during development. You can use a cURL command below to make a POST request to the /generate
endpoint with a prompt.
curl -X 'POST' 'http://127.0.0.1:8000/generate' \
-H 'Content-Type: application/json' \
-d '{"prompt": "Tell me a fun fact about AI"}'
You should receive a response like this:
{ "response": "AI was first introduced as a field in 1956 at a conference at Dartmouth College. It was the birth of modern artificial intelligence!" }
That marks the FastAPI-based GenAI API's readiness for local testing. Next, you will integrate AWS Lambda and API Gateway for serverless deployment.
Deploying FastAPI to AWS Lambda
To deploy the FastAPI GenAI server to AWS Lambda, you will need to set up a few key components namely:
- Mangum for making FastAPI compatible with AWS Lambda
- The Lambda function handler
- AWS API Gateway to expose the FastAPI endpoints
- The
OPENAI_API_KEY
added into AWS Secrets Manager
Mangum is a Python library that allows ASGI] applications (like FastAPI) to run on AWS Lambda. It acts as an adapter, making FastAPI compatible with AWS Lambda’s event-driven architecture and API Gateway.
Creating an AWS Lambda Function Handler with Magnum
Once your FastAPI application is set up locally, you will need to wrap it in a handler that AWS Lambda can invoke when requests come in via API Gateway. This is where Mangum comes in. Modify your main.py
by importing Mangum and wrapping the FastAPI app. Add the handler right after defining your endpoints.
import mangum
# Create the handler for AWS Lambda
handler = mangum.Mangum(app)
When your app is running in AWS, you need to ensure that the OPENAI_API_KEY
is accessed securely. You can add it into AWS Secrets Manager and update the main.py
so that depending on where you run the app, the corresponding OPENAI_API_KEY
is used.
The command below securely stores the OPENAI_API_KEY
in AWS Secrets Manager, ensuring that sensitive credentials are not hardcoded in the application.
-
create-secret
: Creates a new secret in AWS Secrets Manager. -
--name
: Specifies the unique name of the secret. -
--description
: Provides a brief description of the secret. -
--secret-string
: Stores the actual secret as a JSON object, whereYOUR_OPENAI_API_KEY
should be replaced with the actual API key.
aws secretsmanager create-secret \
--name openai/api_key \
--description "OpenAI API Key for GenAI API" \
--secret-string '{"OPENAI_API_KEY":"YOUR_OPENAI_API_KEY"}'
Once stored, the application can retrieve this secret dynamically.
Then, update the get_openai_api_key
function in the main.py
file to allow retrieval of the key from the .env
file when running locally and from AWS Secrets Manager when running on Lambda.
Code Breakdown:
-
If running on AWS Lambda (detected via
AWS_LAMBDA_FUNCTION_NAME
):- It fetches the API key securely from AWS Secrets Manager.
- A Secrets Manager client is created, and the stored secret (
openai/api_key
) is retrieved and parsed.
-
If running locally:
- It loads the API key from the
.env
file via environment variables.
- It loads the API key from the
import boto3
import json
def get_openai_api_key():
# Check if running locally or in Lambda
if os.environ.get("AWS_LAMBDA_FUNCTION_NAME"):
# Running in Lambda, get key from AWS Secrets Manager
secret_name = "openai/api_key"
try:
# Create a Secrets Manager client
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name="eu-central-1")
# Get the secret API Key
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
secret = get_secret_value_response['SecretString']
secret_dict = json.loads(secret)
api_key = secret_dict.get("OPENAI_API_KEY")
if not api_key:
raise KeyError("OPENAI_API_KEY not found in Secrets Manager.")
return api_key
except Exception as e:
raise HTTPException(status_code=500, detail="Failed to retrieve API key from Secrets Manager")
else:
# Running locally, get key from .env file
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")
logger.info("Successfully retrieved OpenAI API key from .env file.")
return api_key
Testing and Validating the API
Testing and validating the API is crucial to ensure it is functioning correctly before deploying it. Below are several tests using pytest and unittest packages. The unit tests check if the app runs locally and in AWS Lambda, ensuring that requests work in both setups.
These tests validate the core functionality of the FastAPI-based GenAI server by covering different scenarios:
-
Basic API Functionality: Tests the root (
/
) endpoint and the/generate
endpoint with a valid prompt. -
Input Validation: Ensures that invalid input (e.g., missing
prompt
) returns appropriate error responses. - Error Handling: Mocks scenarios such as missing API keys and verifies that the API correctly returns error messages.
-
Mocking External Dependencies: Uses
unittest.mock.patch
to simulate OpenAI API calls and AWS Secrets Manager, ensuring API integration works as expected without relying on actual external services.
from fastapi.testclient import TestClient
from fastapi import HTTPException
from unittest.mock import patch, MagicMock
from main import app
import pytest
import os
@pytest.fixture
def client():
"""Fixture for FastAPI test client"""
return TestClient(app)
def test_root_endpoint(client):
"""Test the root endpoint"""
response = client.get("/")
assert response.status_code == 200
assert response.json() == {"message": "Welcome to the GenAI API"}
def test_generate_endpoint(client):
"""Test the generate endpoint"""
response = client.post("/generate", json={"prompt": "Tell me a joke"})
# Assert error status code
response_data = response.json()
assert response.status_code == 200
assert "response" in response_data
assert isinstance(response_data["response"], str)
assert len(response_data["response"]) > 0
def test_generate_invalid_input(client):
"""Test the generate endpoint with invalid input"""
# Test with missing prompt field
response = client.post("/generate", json={})
# Assert validation error
assert response.status_code == 422 # Unprocessable Entity
assert "prompt" in response.json()["detail"][0]["loc"]
@patch("main.get_openai_api_key") # Patch the get_openai_api_key function in main.py
def test_generate_text_missing_api_key(mock_get_api_key, client):
"""Test the generate endpoint when the API key is missing"""
# Setup mock to raise an HTTPException
mock_get_api_key.side_effect = HTTPException(status_code=500, detail="API key not found")
# Test with a sample prompt
response = client.post("/generate", json={"prompt": "Tell me a joke"})
# Assert error status code
assert response.status_code == 500 # Internal Server Error
assert "API key not found" in response.json()["detail"]
# # Test function to mock OpenAI client behavior
@patch("main.get_openai_client") # Patch the get_openai_client function in main.py
def test_mock_client(mock_get_client):
"""Test the OpenAI client behavior with a simplified mock client"""
# Set up the mock OpenAI client and the mock response in one go
mock_response = MagicMock()
mock_response.choices = [
MagicMock(
message=MagicMock(content="Mock response") # Directly mock the message and its content
)
]
# When `chat.completions.create()` is called, return the mock response
mock_get_client.return_value.chat.completions.create.return_value = mock_response
# Simulate calling the OpenAI client's `chat.completions.create()`
result = mock_get_client.return_value.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a joke"}],
max_tokens=200
)
# Assert the mock response
assert result == mock_response
assert result.choices[0].message.content == "Mock response"
@patch("boto3.session.Session")
def test_get_openai_api_key_aws_environment(mock_session, client):
"""Test retrieving API key from AWS Secrets Manager"""
# Set up environment to simulate AWS Lambda
with patch.dict(os.environ, {"AWS_LAMBDA_FUNCTION_NAME": "test-function"}, clear=True):
# Create mock for the entire boto3 session and client chain
mock_client = MagicMock()
mock_session.return_value.client.return_value = mock_client
# Mock the get_secret_value response
mock_response = {
'SecretString': '{"OPENAI_API_KEY": "test-api-key"}'
}
mock_client.get_secret_value.return_value = mock_response
# Call the function under test
from main import get_openai_api_key
api_key = get_openai_api_key()
# Assertions
mock_session.assert_called_once()
mock_session.return_value.client.assert_called_with(
service_name='secretsmanager',
region_name="eu-central-1"
)
mock_client.get_secret_value.assert_called_with(SecretId="openai/api_key")
assert api_key == "test-api-key"
Mocking is an essential technique for testing app behavior before deploying it to production environments. It helps simulate API interactions, allowing you to check how the application would respond under various conditions without making real calls to external services.
API Deployment to AWS with AWS SAM
To expose your FastAPI endpoints using AWS API Gateway, you will use AWS Serverless Application Model (AWS SAM). AWS SAM simplifies the process of building and deploying serverless applications on AWS by providing a simplified syntax for defining AWS resources such as Lambda functions, API Gateway, IAM roles, and other related services, all within a template.yaml
file.
Key components of the template.yaml
file:
- Lambda Function: The serverless function that will execute the FastAPI application logic.
- API Gateway: The API Gateway exposes the FastAPI application as HTTP endpoints.
- Secrets Manager: Stores OpenAI API Key securely, which will be retrieved by Lambda.
- Policies: Defines necessary IAM roles and policies that allow Lambda to interact with other AWS services (e.g., Secrets Manager).
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: GenAI API with FastAPI and Lambda
# Global variables
Globals:
Function: # Lambda function resources in the template
Timeout: 30
MemorySize: 256
Runtime: python3.11
Architectures:
- x86_64
Environment:
Variables:
OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'
Api:
EndpointConfiguration: REGIONAL
Cors:
AllowMethods: "'*'"
AllowHeaders: "'Content-Type,Authorization'"
AllowOrigin: "'*'"
# AWS resources that will be created
Resources:
# API Gateway
GenAIApi:
Type: AWS::Serverless::Api
Properties:
StageName: dev
EndpointConfiguration: REGIONAL
Cors:
AllowMethods: "'*'"
AllowHeaders: "'Content-Type,Authorization'"
AllowOrigin: "'*'"
# Lambda function
GenAIFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./app/
Handler: main.handler
Description: FastAPI GenAI service using OpenAI API
Policies:
- AWSLambdaBasicExecutionRole
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'
Environment:
Variables:
OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'
Events:
RootPath:
Type: Api
Properties:
RestApiId: !Ref GenAIApi
Path: /
Method: ANY
GeneratePath:
Type: Api
Properties:
RestApiId: !Ref GenAIApi
Path: /generate
Method: ANY
Outputs:
GenAIApiEndpoint:
Description: API Gateway endpoint URL for the GenAI service
Value: !Sub 'https://${GenAIApi}.execute-api.${AWS::Region}.amazonaws.com/dev/'
Deploying the FastAPI Application
Once the template.yaml
file is ready, the next step is to deploy your application using AWS SAM. Before deploying, you will need to create a Lambda deployment package that includes both the application code, main.py
and necessary dependencies.
To make this easier, you will use a bash script (build-sam.sh
) to automate the process. This script will create a folder named app where main.py
will be copied and the dependencies from pyproject.toml
will be transferred into a requirements.txt
file, which works seamlessly with AWS Lambda.
#!/bin/bash
set -e
echo "