Building a Scalable Multi-Tenant Generative AI Platform with Amazon Bedrock : Part 1

Streamlining Bedrock Application Inference Profile Creation with Service Catalog In today’s dynamic cloud environments, managing resources in a multi-tenant scenario requires a standardized approach along with business governance and cost control that empowers project teams to work autonomously. In this series, we explore how to build a cost-effective and manageable multi-tenant Generative AI platform using Amazon Bedrock. In Part 1, we focus on the challenges of managing Bedrock Inference Profiles and how AWS Service Catalog offers an elegant solution to provision these profiles in a controlled and standardized manner. Introduction Managing application inference profiles for Amazon Bedrock in a multi-tenant environment can be challenging. Issues such as inconsistent resource provisioning, cost tracking, and ensuring that only approved configurations are used across different departments can quickly lead to operational overhead. AWS Service Catalog addresses these challenges by enabling you to create pre-approved CloudFormation templates that encapsulate your organizational best practices. With Service Catalog, you can: Standardize Provisioning: Enforce common configurations across all teams. Enhance Governance: Track resource deployments and apply organizational policies consistently. Improve Cost Visibility: Integrate with CloudWatch dashboards to monitor resource usage and spending. Solution Overview The solution architecture leverages several AWS services to create an automated, secure, and scalable provisioning process for Bedrock Application Inference Profiles: AWS Service Catalog: Acts as the central provisioning mechanism where a CloudFormation template is used to create Bedrock Inference Profiles as a vending machine. AWS Systems Manager (SSM) Parameter Store: Stores the ARNs of the created inference profiles, which can be securely retrieved by applications during runtime. Amazon CloudWatch: Provides dashboards for monitoring cost and usage metrics, ensuring that resource consumption remains transparent and accountable. The following diagram illustrates the high-level architecture: Bedrock Application Inference Profile Provisioning Product Step-by-Step Implementation 1. Creating the Bedrock Application Inference Profile Template (CloudFormation) Before creating any Service Catalog product, you need to have a robust CloudFormation template. This template should include all the resources required to deploy a Bedrock Application Inference Profile along with supporting resources for governance and cost tracking. For example: Application InferenceProfile Resource: Creates the Bedrock Application Inference Profile. Use parameters (e.g., ProjectName , Department , modelArn ) to customize the profile details by selecting underlying Bedrock FM model, and tags. SSM Parameter Resource: Stores the ARN of the Application Inference Profile so downstream applications can securely retrieve it. Budget Alert Resource (Optional): Configures AWS Budgets or CloudWatch alarms to track spending and notify when cost thresholds are reached. CloudFormation Template YAML: AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::LanguageExtensions Parameters: ProjectName: Type: String Description: "Name of the project" Department: Type: String AllowedValues: [marketing, sales, engineering] Description: "Department name" ModelArns: Type: String Description: "Select a model ARN from the allowed list" AllowedValues: - arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0 - arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-mid-v1 - arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-ultra-v1 - arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-text-premier-v1 Default: arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0 Resources: InferenceProfile: Type: AWS::Bedrock::ApplicationInferenceProfile Properties: InferenceProfileName: !Sub "${Department}-${ProjectName}-Profile" ModelSource: CopyFrom: !Ref ModelArns # Use the selected model ARN Tags: - Key: CostCenter Value: !Ref Department - Key: Project Value: !Ref ProjectName ProfileArnParameter: Type: AWS::SSM::Parameter Properties: Name: !Sub "/bedrock/${Department}/${ProjectName}/profile-arn" Type: String Value: !GetAtt InferenceProfile.InferenceProfileArn Tags: Project: !Ref ProjectName CostCenter: !Ref Department BudgetAlert: Type: AWS::Budgets::Budget Properties: Budget: BudgetName: !Sub "${ProjectName}-Bedrock-Budget" BudgetLimit: Amount: 1000 Unit: USD BudgetType: COST TimeUnit: MONTHLY CostFilters: TagKeyValue: - !Sub "Project$${ProjectName}" # Use TagKeyValue for tag-based filtering N

Apr 6, 2025 - 22:35

Building a Scalable Multi-Tenant Generative AI Platform with Amazon Bedrock : Part 1

Streamlining Bedrock Application Inference Profile Creation with Service Catalog

In today’s dynamic cloud environments, managing resources in a multi-tenant scenario requires a standardized approach along with business governance and cost control that empowers project teams to work autonomously. In this series, we explore how to build a cost-effective and manageable multi-tenant Generative AI platform using Amazon Bedrock. In Part 1, we focus on the challenges of managing Bedrock Inference Profiles and how AWS Service Catalog offers an elegant solution to provision these profiles in a controlled and standardized manner.

Introduction

Managing application inference profiles for Amazon Bedrock in a multi-tenant environment can be challenging. Issues such as inconsistent resource provisioning, cost tracking, and ensuring that only approved configurations are used across different departments can quickly lead to operational overhead. AWS Service Catalog addresses these challenges by enabling you to create pre-approved CloudFormation templates that encapsulate your organizational best practices. With Service Catalog, you can:

Standardize Provisioning: Enforce common configurations across all teams.
Enhance Governance: Track resource deployments and apply organizational policies consistently.
Improve Cost Visibility: Integrate with CloudWatch dashboards to monitor resource usage and spending.

Solution Overview

The solution architecture leverages several AWS services to create an automated, secure, and scalable provisioning process for Bedrock Application Inference Profiles:

AWS Service Catalog: Acts as the central provisioning mechanism where a CloudFormation template is used to create Bedrock Inference Profiles as a vending machine.
AWS Systems Manager (SSM) Parameter Store: Stores the ARNs of the created inference profiles, which can be securely retrieved by applications during runtime.
Amazon CloudWatch: Provides dashboards for monitoring cost and usage metrics, ensuring that resource consumption remains transparent and accountable.

The following diagram illustrates the high-level architecture:

Bedrock Application Inference Profile Provisioning Product

Step-by-Step Implementation

1. Creating the Bedrock Application Inference Profile Template (CloudFormation)

Before creating any Service Catalog product, you need to have a robust CloudFormation template. This template should include all the resources required to deploy a Bedrock Application Inference Profile along with supporting resources for governance and cost tracking. For example:

Application InferenceProfile Resource: Creates the Bedrock Application Inference Profile. Use parameters (e.g., ProjectName , Department , modelArn ) to customize the profile details by selecting underlying Bedrock FM model, and tags.
SSM Parameter Resource: Stores the ARN of the Application Inference Profile so downstream applications can securely retrieve it.
Budget Alert Resource (Optional): Configures AWS Budgets or CloudWatch alarms to track spending and notify when cost thresholds are reached.

CloudFormation Template YAML:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::LanguageExtensions
Parameters:
  ProjectName:
    Type: String
    Description: "Name of the project"
  Department:
    Type: String
    AllowedValues: [marketing, sales, engineering]
    Description: "Department name"
  ModelArns:
    Type: String
    Description: "Select a model ARN from the allowed list"
    AllowedValues:
      - arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0
      - arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-mid-v1
      - arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-ultra-v1
      - arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-text-premier-v1
    Default: arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0

Resources:
  InferenceProfile:
    Type: AWS::Bedrock::ApplicationInferenceProfile
    Properties:
      InferenceProfileName: !Sub "${Department}-${ProjectName}-Profile"
      ModelSource:
        CopyFrom: !Ref ModelArns  # Use the selected model ARN
      Tags:
        - Key: CostCenter
          Value: !Ref Department
        - Key: Project
          Value: !Ref ProjectName

  ProfileArnParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Sub "/bedrock/${Department}/${ProjectName}/profile-arn"
      Type: String
      Value: !GetAtt InferenceProfile.InferenceProfileArn
      Tags:
        Project: !Ref ProjectName
        CostCenter: !Ref Department

  BudgetAlert:
    Type: AWS::Budgets::Budget
    Properties:
      Budget:
        BudgetName: !Sub "${ProjectName}-Bedrock-Budget"
        BudgetLimit:
          Amount: 1000
          Unit: USD
        BudgetType: COST
        TimeUnit: MONTHLY
        CostFilters:
          TagKeyValue:
            - !Sub "Project$${ProjectName}"  # Use TagKeyValue for tag-based filtering
      NotificationsWithSubscribers:
        - Notification:
            NotificationType: ACTUAL
            ComparisonOperator: GREATER_THAN
            Threshold: 80
            ThresholdType: PERCENTAGE
          Subscribers:
            - SubscriptionType: EMAIL
              Address: !Sub "${ProjectName}-admin@example.com"

This template is the foundation of your Service Catalog product. It enables standardized resource creation and enforces governance by embedding project-specific parameters.

2. Creating the Service Catalog Product

With the CloudFormation template in hand, the next step is to package it as a Service Catalog product. This product encapsulates the logic defined in your template and makes it available for self-service provisioning.

Navigate to the AWS Service Catalog Console: Open the AWS Service Catalog console at https://console.aws.amazon.com/servicecatalog/ .
Create a Portfolio e.g genai

Create Portfolio

3. Create a New Product:

Product Details:
Product Name: e.g., Bedrock Inference Profile Provisioner
Product Description: Describe that this product provisions a standardized Bedrock Application Inference Profile for multi-tenant GenAI workloads.
Owner: e.g., Cloud Center of Excellence (CCoE) or IT
Version Details:
Choose Use a CloudFormation template .
Either upload the local CloudFormation template file (sample above) or provide an Amazon S3 URL.
Enter a Version Name (e.g., v1.0 ) and a short description.
Support Details: Optionally, add an email contact and a support URL for troubleshooting.
Provisioning Parameters: The product should expose parameters like ProjectName , modelArns and Department , which users will fill out during provisioning.

Finalize the Product:

Click Create product to save your product. It will now appear on the product list.

Create Product

Click Grant Access: There are two ways to grant access to your portfolio. You can either grant access to IAM Principals (groups, roles or users) that are present in your account or if you want to share the portfolio, you can specify Principal Names (groups, roles or users) that will be shared with the portfolio.

Access to Portfolio products

Now the Bedrock Inference Profile Provisioner should be available under Products to the granted IAM principals (groups, roles or users).

Products ready to provision

This product is now reusable and allows end users to self-service their provisioning needs without deviating from organizational standards.

3. Provisioning the Product

Next, project teams can provision their own Bedrock Inference Profiles by:

Navigating to the AWS Service Catalog console.
Selecting the “Bedrock Inference Profile Provisioner” product.
Filling in the required parameters such as ProjectName ModelArns(select from drop down) , Department(drop down) .
Launching the product, which triggers the CloudFormation stack creation and provisions all associated resources.

Launch Product

Verify the provisioned product details, including the resources created by CloudFormation, and verifying the AWS resource links under the Physical ID and CloudFormation Stack ARN.

Provisioned Products

Provisioned Product details

Budget details

SSM Parameter containing the Bedrock Inference Profile Arn

Note: You can’t view application inference profiles in the Amazon Bedrock console. You can only view the system-defined inference profile in the console. However, you can view the application inference profile using the ‘get-inference-profile’ API. Just run the below command in order to view the details of the application inference profile (replace the with the SSM parameter value or InferenceProfile value from Provisioned product resources tab) :

aws bedrock get-inference-profile --inference-profile-identifier

You should see an output like below

{
    "inferenceProfileName": "engineering-slack_chatbot-Profile",
    "createdAt": "2025-04-02T21:58:52.644591+00:00",
    "updatedAt": "2025-04-02T21:58:52.644591+00:00",
    "inferenceProfileArn": "arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/8gjoyf1x2bn9",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0"
        }
    ],
    "inferenceProfileId": "8gjoyf1x2bn9",
    "status": "ACTIVE",
    "type": "APPLICATION"
}

Conclusion

In Part 1 of our series, we introduced a robust solution for standardizing the provisioning of Amazon Bedrock Inference Profiles using AWS Service Catalog. By encapsulating best practices into a reusable CloudFormation template and leveraging Service Catalog’s governance features, organizations can streamline resource management, enforce policies, and achieve better cost control in multi-tenant environments.

Stay tuned for Part 2, where we dive deeper into integrating runtime security measures and monitoring to further enhance your multi-tenant GenAI platform by securely retrieving Application Inference Profile ARNs from SSM Parameter Store and use them to invoke Bedrock models, ensuring proper cost allocation and access control.