Building a serverless GenAI API with FastAPI, AWS, and CircleCI

The advancement of AI has empowered businesses to incorporate intelligent automation into their applications. A serverless Generative AI (GenAI) API enables developers to harness cutting-edge AI models without the burden of infrastructure management. This guide walks you through building a scalable and cost-effective GenAI API using FastAPI, a high-performance Python framework with built-in async support and seamless AWS integration. By deploying FastAPI on AWS Lambda with AWS API Gateway, you can create a fully managed, pay-per-use architecture that eliminates server maintenance. To simplify development and deployment, you will set up a Continuous Integration and Continuous Deployment (CI/CD) pipeline with CircleCI, automating testing, building, and deployment. With CircleCI’s GitHub integration, you’ll achieve continuous delivery, reducing errors and accelerating development cycles. This combination of FastAPI, AWS Lambda, and CircleCI ensures a robust, scalable, and efficient GenAI API ready for real-world applications. You can check out the complete source code on GitHub, but this tutorial will guide you to build it from scratch. Prerequisites Before diving into the process of building a serverless GenAI API, there are several prerequisites you need to have in place. Here is a breakdown of what you will need: AWS Account: You will need an active AWS account to deploy the serverless application using AWS services like Lambda and API Gateway. AWS CLI: To install and configure the AWS CLI, follow the instructions in the AWS CLI documentation. Once installed, configure it with aws configure and provide your AWS access key, secret key, region, and output format. Basic Understanding of RESTful APIs, FastAPI, and GenAI Models: This project assumes a basic understanding of RESTful APIs, FastAPI, and GenAI models. REST APIs enable communication between clients (like web or mobile apps) and servers, while FastAPI is a fast, modern Python framework for building APIs with automatic documentation generation. GenAI models, such as OpenAI’s GPT, generate human-like text and other outputs, and in this project, you will integrate OpenAI into the API to provide responses to user queries. GitHub and CircleCI Accounts: You will need a GitHub account to host your project’s repository and a CircleCI account to automate testing and deployment through CI/CD. OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website. Setting Up the FastAPI GenAI Server FastAPI is a modern, high-performance web framework for building APIs with Python. It is particularly well-suited for LLM-based APIs due to its speed, simplicity, and support for asynchronous operations, which enable handling multiple requests efficiently. For this project, you will integrate OpenAI’s GPT-4o-mini model via its API to generate AI-driven responses with minimal setup. Installing Dependencies and GenAI Libraries First, clone the repository containing the project code. git clone https://github.com/CIRCLECI-GWP/genai-aws-circleci.git cd genai-aws-circleci Then, install the package manager uv from astral, a fast Python package manager, instead of pip. Written in Rust, uv is chosen for its speed, efficient dependency resolution, and built-in support for managing virtual environments. You can install it using the following command. curl -LsSf https://astral.sh/uv/install.sh | sh Once you have installed uv, run the following commands to install dependencies and activate the virtual environment. uv sync source .venv/bin/activate The uv sync command will: Install the dependencies defined in pyproject.toml. Automatically create a virtual environment (.venv). Finally, creeate .env file in the root directory of your respository and add your OPENAI_API_KEY to the file. OPENAI_API_KEY=your-openai-key Define Endpoints With dependencies installed, you can now define the FastAPI endpoints to interact with the GPT-4o-mini model. This implementation, found in main.py, does not yet include AWS integration, which you will cover in the next section. Code Breakdown: PromptRequest (Pydantic Model): Defines the expected structure of incoming requests. It ensures that each request contains a prompt string. get_openai_api_key(): Retrieves the OpenAI API key from the environment variables file. If the key is missing, it raises an HTTPException to prevent unauthorized API calls. get_openai_client(): Uses get_openai_api_key() to fetch the API key and initialize the OpenAI client. If initialization fails, an exception is raised. Root Endpoint (/): A simple health check that confirms the API is running. Generate Endpoint (/generate): Accepts a POST request containing a prompt, passes it to OpenAI’s GPT-4o-mini, and returns the generated response. It depends on get_openai_client() to ensure a valid API connection. OpenAI AP

Mar 12, 2025 - 21:15

Building a serverless GenAI API with FastAPI, AWS, and CircleCI

The advancement of AI has empowered businesses to incorporate intelligent automation into their applications. A serverless Generative AI (GenAI) API enables developers to harness cutting-edge AI models without the burden of infrastructure management. This guide walks you through building a scalable and cost-effective GenAI API using FastAPI, a high-performance Python framework with built-in async support and seamless AWS integration. By deploying FastAPI on AWS Lambda with AWS API Gateway, you can create a fully managed, pay-per-use architecture that eliminates server maintenance.

To simplify development and deployment, you will set up a Continuous Integration and Continuous Deployment (CI/CD) pipeline with CircleCI, automating testing, building, and deployment. With CircleCI’s GitHub integration, you’ll achieve continuous delivery, reducing errors and accelerating development cycles. This combination of FastAPI, AWS Lambda, and CircleCI ensures a robust, scalable, and efficient GenAI API ready for real-world applications.

You can check out the complete source code on GitHub, but this tutorial will guide you to build it from scratch.

Prerequisites

Before diving into the process of building a serverless GenAI API, there are several prerequisites you need to have in place. Here is a breakdown of what you will need:

AWS Account: You will need an active AWS account to deploy the serverless application using AWS services like Lambda and API Gateway.
AWS CLI: To install and configure the AWS CLI, follow the instructions in the AWS CLI documentation. Once installed, configure it with aws configure and provide your AWS access key, secret key, region, and output format.
Basic Understanding of RESTful APIs, FastAPI, and GenAI Models: This project assumes a basic understanding of RESTful APIs, FastAPI, and GenAI models. REST APIs enable communication between clients (like web or mobile apps) and servers, while FastAPI is a fast, modern Python framework for building APIs with automatic documentation generation. GenAI models, such as OpenAI’s GPT, generate human-like text and other outputs, and in this project, you will integrate OpenAI into the API to provide responses to user queries.
GitHub and CircleCI Accounts: You will need a GitHub account to host your project’s repository and a CircleCI account to automate testing and deployment through CI/CD.
OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website.

Setting Up the FastAPI GenAI Server

FastAPI is a modern, high-performance web framework for building APIs with Python. It is particularly well-suited for LLM-based APIs due to its speed, simplicity, and support for asynchronous operations, which enable handling multiple requests efficiently. For this project, you will integrate OpenAI’s GPT-4o-mini model via its API to generate AI-driven responses with minimal setup.
Installing Dependencies and GenAI Libraries

First, clone the repository containing the project code.

git clone https://github.com/CIRCLECI-GWP/genai-aws-circleci.git
cd genai-aws-circleci

Then, install the package manager uv from astral, a fast Python package manager, instead of pip. Written in Rust, uv is chosen for its speed, efficient dependency resolution, and built-in support for managing virtual environments. You can install it using the following command.

curl -LsSf https://astral.sh/uv/install.sh | sh

Once you have installed uv, run the following commands to install dependencies and activate the virtual environment.

uv sync
source .venv/bin/activate

The uv sync command will:

Install the dependencies defined in pyproject.toml.
Automatically create a virtual environment (.venv).

Finally, creeate .env file in the root directory of your respository and add your OPENAI_API_KEY to the file.

OPENAI_API_KEY=your-openai-key

Define Endpoints

With dependencies installed, you can now define the FastAPI endpoints to interact with the GPT-4o-mini model. This implementation, found in main.py, does not yet include AWS integration, which you will cover in the next section.

Code Breakdown:

PromptRequest (Pydantic Model): Defines the expected structure of incoming requests. It ensures that each request contains a prompt string.
get_openai_api_key(): Retrieves the OpenAI API key from the environment variables file. If the key is missing, it raises an HTTPException to prevent unauthorized API calls.
get_openai_client(): Uses get_openai_api_key() to fetch the API key and initialize the OpenAI client. If initialization fails, an exception is raised.
Root Endpoint (/): A simple health check that confirms the API is running.
Generate Endpoint (/generate): Accepts a POST request containing a prompt, passes it to OpenAI’s GPT-4o-mini, and returns the generated response. It depends on get_openai_client() to ensure a valid API connection.
OpenAI API Call: Uses chat.completions.create() to send the user’s prompt to OpenAI and returns the generated response.

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv()

# Initialize FastAPI
app = FastAPI()

# Pydantic model to define expected structure of request
class PromptRequest(BaseModel):
    """Model for request validation."""
    prompt: str

def get_openai_api_key():
        api_key = os.environ.get("OPENAI_API_KEY")

        if not api_key:
           raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")
        return api_key

def get_openai_client():

    try:
        api_key = get_openai_api_key()
        return OpenAI(api_key=api_key)
    except HTTPException as e:

        raise HTTPException(status_code=500, detail="Failed to initialize OpenAI client: " + str(e.detail))


@app.get("/")
async def root():
    """Root endpoint to confirm API is running."""
    return {"message": "Welcome to the GenAI API"}

@app.post("/generate")
async def generate_text(request: PromptRequest, client: OpenAI = Depends(get_openai_client)):

    if not client:
        raise HTTPException(status_code=500, detail="OpenAI API client not initialized.")

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",  
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=200
        )

        if not response.choices:
            raise ValueError("No response received from OpenAI API.")

        return {"response": response.choices[0].message.content}

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run the app with uvicorn
if __name__ == "__main__":
    import uvicorn
    uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True)

Running the FastAPI application

To run the FastAPI application, execute the command below:

uv run main.py

The command will start the FastAPI server locally at http://127.0.0.1:8000, and the --reload option allows for hot-reloading during development. You can use a cURL command below to make a POST request to the /generate endpoint with a prompt.

curl -X 'POST' 'http://127.0.0.1:8000/generate' \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "Tell me a fun fact about AI"}'

You should receive a response like this:

{ "response": "AI was first introduced as a field in 1956 at a conference at Dartmouth College. It was the birth of modern artificial intelligence!" }

That marks the FastAPI-based GenAI API's readiness for local testing. Next, you will integrate AWS Lambda and API Gateway for serverless deployment.

Deploying FastAPI to AWS Lambda

To deploy the FastAPI GenAI server to AWS Lambda, you will need to set up a few key components namely:

Mangum for making FastAPI compatible with AWS Lambda
The Lambda function handler
AWS API Gateway to expose the FastAPI endpoints
The OPENAI_API_KEY added into AWS Secrets Manager

Mangum is a Python library that allows ASGI] applications (like FastAPI) to run on AWS Lambda. It acts as an adapter, making FastAPI compatible with AWS Lambda’s event-driven architecture and API Gateway.

Creating an AWS Lambda Function Handler with Magnum

Once your FastAPI application is set up locally, you will need to wrap it in a handler that AWS Lambda can invoke when requests come in via API Gateway. This is where Mangum comes in. Modify your main.py by importing Mangum and wrapping the FastAPI app. Add the handler right after defining your endpoints.

import mangum  

# Create the handler for AWS Lambda
handler = mangum.Mangum(app)

When your app is running in AWS, you need to ensure that the OPENAI_API_KEY is accessed securely. You can add it into AWS Secrets Manager and update the main.py so that depending on where you run the app, the corresponding OPENAI_API_KEY is used.

The command below securely stores the OPENAI_API_KEY in AWS Secrets Manager, ensuring that sensitive credentials are not hardcoded in the application.

create-secret: Creates a new secret in AWS Secrets Manager.
--name: Specifies the unique name of the secret.
--description: Provides a brief description of the secret.
--secret-string: Stores the actual secret as a JSON object, where YOUR_OPENAI_API_KEY should be replaced with the actual API key.

aws secretsmanager create-secret \
    --name openai/api_key \
    --description "OpenAI API Key for GenAI API" \
    --secret-string '{"OPENAI_API_KEY":"YOUR_OPENAI_API_KEY"}'

Once stored, the application can retrieve this secret dynamically.

Then, update the get_openai_api_key function in the main.py file to allow retrieval of the key from the .env file when running locally and from AWS Secrets Manager when running on Lambda.

Code Breakdown:

If running on AWS Lambda (detected via AWS_LAMBDA_FUNCTION_NAME):
- It fetches the API key securely from AWS Secrets Manager.
- A Secrets Manager client is created, and the stored secret (openai/api_key) is retrieved and parsed.
If running locally:
- It loads the API key from the .env file via environment variables.

import boto3  
import json

def get_openai_api_key():

    # Check if running locally or in Lambda
    if os.environ.get("AWS_LAMBDA_FUNCTION_NAME"):
        # Running in Lambda, get key from AWS Secrets Manager
        secret_name = "openai/api_key"

        try:
            # Create a Secrets Manager client
            session = boto3.session.Session()
            client = session.client(service_name='secretsmanager', region_name="eu-central-1")

            # Get the secret API Key
            get_secret_value_response = client.get_secret_value(SecretId=secret_name)
            secret = get_secret_value_response['SecretString']
            secret_dict = json.loads(secret)

            api_key = secret_dict.get("OPENAI_API_KEY")
            if not api_key:
                raise KeyError("OPENAI_API_KEY not found in Secrets Manager.")

            return api_key
        except Exception as e:
            raise HTTPException(status_code=500, detail="Failed to retrieve API key from Secrets Manager")
    else:
        # Running locally, get key from .env file
        api_key = os.environ.get("OPENAI_API_KEY")

        if not api_key:
           raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")

        logger.info("Successfully retrieved OpenAI API key from .env file.")
        return api_key

Testing and Validating the API

Testing and validating the API is crucial to ensure it is functioning correctly before deploying it. Below are several tests using pytest and unittest packages. The unit tests check if the app runs locally and in AWS Lambda, ensuring that requests work in both setups.

These tests validate the core functionality of the FastAPI-based GenAI server by covering different scenarios:

Basic API Functionality: Tests the root (/) endpoint and the /generate endpoint with a valid prompt.
Input Validation: Ensures that invalid input (e.g., missing prompt) returns appropriate error responses.
Error Handling: Mocks scenarios such as missing API keys and verifies that the API correctly returns error messages.
Mocking External Dependencies: Uses unittest.mock.patch to simulate OpenAI API calls and AWS Secrets Manager, ensuring API integration works as expected without relying on actual external services.

from fastapi.testclient import TestClient
from fastapi import HTTPException
from unittest.mock import patch, MagicMock
from main import app
import pytest
import os


@pytest.fixture
def client():
    """Fixture for FastAPI test client"""
    return TestClient(app)

def test_root_endpoint(client):
    """Test the root endpoint"""
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the GenAI API"}

def test_generate_endpoint(client):
    """Test the generate endpoint"""

    response = client.post("/generate", json={"prompt": "Tell me a joke"})

    # Assert error status code
    response_data = response.json()
    assert response.status_code == 200
    assert "response" in response_data
    assert isinstance(response_data["response"], str)
    assert len(response_data["response"]) > 0

def test_generate_invalid_input(client):
    """Test the generate endpoint with invalid input"""
    # Test with missing prompt field
    response = client.post("/generate", json={})

    # Assert validation error
    assert response.status_code == 422 # Unprocessable Entity
    assert "prompt" in response.json()["detail"][0]["loc"]

@patch("main.get_openai_api_key") # Patch the get_openai_api_key function in main.py
def test_generate_text_missing_api_key(mock_get_api_key, client):
    """Test the generate endpoint when the API key is missing"""

    # Setup mock to raise an HTTPException
    mock_get_api_key.side_effect = HTTPException(status_code=500, detail="API key not found")

    # Test with a sample prompt
    response = client.post("/generate", json={"prompt": "Tell me a joke"})

    # Assert error status code
    assert response.status_code == 500 # Internal Server Error
    assert "API key not found" in response.json()["detail"]

# # Test function to mock OpenAI client behavior
@patch("main.get_openai_client")  # Patch the get_openai_client function in main.py
def test_mock_client(mock_get_client):
    """Test the OpenAI client behavior with a simplified mock client"""

    # Set up the mock OpenAI client and the mock response in one go
    mock_response = MagicMock()
    mock_response.choices = [
        MagicMock(
            message=MagicMock(content="Mock response")  # Directly mock the message and its content
        )
    ]

    # When `chat.completions.create()` is called, return the mock response
    mock_get_client.return_value.chat.completions.create.return_value = mock_response


    # Simulate calling the OpenAI client's `chat.completions.create()`
    result = mock_get_client.return_value.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Tell me a joke"}],
        max_tokens=200
    )

    # Assert the mock response
    assert result == mock_response
    assert result.choices[0].message.content == "Mock response"


@patch("boto3.session.Session")
def test_get_openai_api_key_aws_environment(mock_session, client):
    """Test retrieving API key from AWS Secrets Manager"""

    # Set up environment to simulate AWS Lambda
    with patch.dict(os.environ, {"AWS_LAMBDA_FUNCTION_NAME": "test-function"}, clear=True):

        # Create mock for the entire boto3 session and client chain
        mock_client = MagicMock()
        mock_session.return_value.client.return_value = mock_client

        # Mock the get_secret_value response
        mock_response = {
            'SecretString': '{"OPENAI_API_KEY": "test-api-key"}'
        }
        mock_client.get_secret_value.return_value = mock_response

        # Call the function under test
        from main import get_openai_api_key
        api_key = get_openai_api_key()

        # Assertions
        mock_session.assert_called_once()
        mock_session.return_value.client.assert_called_with(
            service_name='secretsmanager',
            region_name="eu-central-1"
        )
        mock_client.get_secret_value.assert_called_with(SecretId="openai/api_key")
        assert api_key == "test-api-key"

Mocking is an essential technique for testing app behavior before deploying it to production environments. It helps simulate API interactions, allowing you to check how the application would respond under various conditions without making real calls to external services.

API Deployment to AWS with AWS SAM

To expose your FastAPI endpoints using AWS API Gateway, you will use AWS Serverless Application Model (AWS SAM). AWS SAM simplifies the process of building and deploying serverless applications on AWS by providing a simplified syntax for defining AWS resources such as Lambda functions, API Gateway, IAM roles, and other related services, all within a template.yaml file.

Key components of the template.yaml file:

Lambda Function: The serverless function that will execute the FastAPI application logic.
API Gateway: The API Gateway exposes the FastAPI application as HTTP endpoints.
Secrets Manager: Stores OpenAI API Key securely, which will be retrieved by Lambda.
Policies: Defines necessary IAM roles and policies that allow Lambda to interact with other AWS services (e.g., Secrets Manager).

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: GenAI API with FastAPI and Lambda

# Global variables
Globals:
  Function: # Lambda function resources in the template
    Timeout: 30
    MemorySize: 256
    Runtime: python3.11
    Architectures:
      - x86_64
    Environment:
      Variables:
        OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'
  Api:
    EndpointConfiguration: REGIONAL
    Cors:
      AllowMethods: "'*'"
      AllowHeaders: "'Content-Type,Authorization'"
      AllowOrigin: "'*'"

# AWS resources that will be created
Resources:
  # API Gateway
  GenAIApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: dev
      EndpointConfiguration: REGIONAL
      Cors:
        AllowMethods: "'*'"
        AllowHeaders: "'Content-Type,Authorization'"
        AllowOrigin: "'*'"

  # Lambda function
  GenAIFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./app/
      Handler: main.handler
      Description: FastAPI GenAI service using OpenAI API
      Policies:
        - AWSLambdaBasicExecutionRole
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - secretsmanager:GetSecretValue
              Resource: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'

      Environment:
        Variables:
          OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'    
      Events:
        RootPath:
          Type: Api
          Properties:
            RestApiId: !Ref GenAIApi
            Path: /       
            Method: ANY   
        GeneratePath:
          Type: Api
          Properties:
            RestApiId: !Ref GenAIApi
            Path: /generate   
            Method: ANY      

Outputs:
  GenAIApiEndpoint:
    Description: API Gateway endpoint URL for the GenAI service
    Value: !Sub 'https://${GenAIApi}.execute-api.${AWS::Region}.amazonaws.com/dev/'

Deploying the FastAPI Application

Once the template.yaml file is ready, the next step is to deploy your application using AWS SAM. Before deploying, you will need to create a Lambda deployment package that includes both the application code, main.py and necessary dependencies.

To make this easier, you will use a bash script (build-sam.sh) to automate the process. This script will create a folder named app where main.py will be copied and the dependencies from pyproject.toml will be transferred into a requirements.txt file, which works seamlessly with AWS Lambda.

#!/bin/bash
set -e
echo "


                                            
                            
                                Read More                                
                            
                        
                                        
                        Tags:
                        
                                                    
                    
                    
                        
                            
                                                                    
                                        
                                            
                                            Previous Article                                        
                                    
                                    
                                        First Hackathon Experience
                                    
                                                            
                            
                                                                    
                                        
                                            Next Article                                            
                                        
                                    
                                    
                                        You made an HTML5 game for Newgrounds? Here's how you can put it on Steam!
                                    
                                                            
                        
                    
                                        
                        
                            
                                
                                    
                                        Related Posts
                                    
                                
                                
                                    
                                                                                            
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        Unleash Your Inner Inventor: Discover Endless Arduino U...
                                                                Mar 30, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        LangCompiler: A Cloud-Native, High-Performance Code Exe...
                                                                Feb 14, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        Leverage Your Code Editor: Store Constants Into An Object
                                                                Feb 27, 2025
     0

                                                        
                                                    
                                                                                    
                                
                            
                        
                    
                                            
                            
                                
                                    
                                                                                    
                                                                            
                                    
                                                                                    
                                                    
        
        
        
            
                
                    Name
                    
                
                
                    Email
                    
                
            
        
        
            Comment