Implementing Nova Act MCP Server on ECS Fargate

Browser Automation as a Service This technical blog post outlines the implementation of a Model Context Protocol (MCP) server for Amazon Nova Act on Amazon ECS Fargate platform as a container, providing browser automation capabilities as a service. Introduction Amazon Nova Act is an early research preview AI model and SDK designed to enable developers to build reliable web agents that can perform actions within a web browser. By integrating Nova Act with the Model Context Protocol (MCP), its browser automation capabilities can be standardized and exposed to diverse clients (e.g., AI assistants, web UIs, IDE extensions) through multiple communication channels: Standard I/O (stdio) for local scripting and CLI tools like Amazon Q Developer CLI & Cline VS Code Extension (open-source) Server-Sent Events (SSE) via HTTP connection for real-time, bidirectional communication Streaming HTTP transport (this was recently introduced in the latest MCP specification) for scalable web-based interactions This implementation deploys Nova Act as a containerized service on Amazon ECS fronted by ALB and uses Server-Sent Events (SSE) via HTTP connection to enable real-time communication. Note: Amazon Nova Act is only available in the US to request for an API Key, and hence the solution is deployed on AWS in us-east-1 region to ensure compliance. Architecture Overview The solution consists of the following components: MCP Server: A FastAPI application that implements the Model Context Protocol and exposes Nova Act functionality by using FastAPI-MCP Streamlit Client: A web UI and MCP Client for interacting with the MCP server AWS Infrastructure: ECS Fargate tasks, ALB, and supporting resources Deployment Guide Prerequisites Before deploying the Nova Act MCP server on ECS, ensure you have: AWS CLI configured with appropriate credentials Node.js and npm installed AWS CDK installed (npm install -g aws-cdk) Docker installed and running A valid Nova Act API key Step 1: Clone the Repository git clone https://github.com/awsdataarchitect/nova-act-ecs.git cd nova-act-ecs Step 2: Export your API Key as environment variable export NOVA_ACT_API_KEY = your-api-key-here Step 3: Deploy the CDK Stack npm install cdk bootstrap cdk deploy Step 4: Access the Application After deployment completes, the CDK will output the ALB DNS name. You can access: MCP Server: http:///mcp Streamlit UI: http://:8501 MCP Server Implementation The server implements the MCP protocol using fastapi_mcp with the following components: MCP Endpoints (automatically handled by fastapi_mcp): /mcp - SSE endpoint for event streaming /mcp/schema - Schema endpoint for method discovery /mcp/jsonrpc - JSON-RPC endpoint for method calls Core API Endpoint: /browse - Combined endpoint for all browser automation tasks Additional Endpoints: /health - Used by ALB for health checks /logs - Endpoint to retrieve recent server logs The "browse" Method The implementation uses a simplified approach with a single "browse" method that combines browser control and instruction execution: @app.post("/browse", operation_id="browse") async def browse(request: BrowseRequest) -> BrowseResponse: """ Execute a browsing task with Nova Act. This method handles browser initialization, navigation, and instruction execution. """ # Implementation details excluded here for brevity... Request Schema The browse method accepts a flexible request schema that can handle various browsing scenarios: class BrowseRequest(BaseModel): starting_url: str instructions: List[str] = Field(..., description="List of instructions to execute sequentially") max_steps_per_instruction: int = 30 timeout_per_instruction: Optional[int] = None schema: Optional[Dict[str, Any]] = None headless: bool = True Response Schema The response includes detailed information about the browsing session: class BrowseResponse(BaseModel): status: str results: List[Dict[str, Any]] errors: List[Dict[str, Any]] = [] Server Features Single Global Browser Instance: The server maintains a single global Nova Act instance Headless Mode: Browser always runs in headless mode for ECS compatibility API Key Management: Retrieves API key from environment variables or AWS Secrets Manager Structured Data Extraction: Supports schema-based data extraction Error Handling: Comprehensive error handling and logging Thread Pool Execution: Runs synchronous Nova Act code in a thread pool to avoid asyncio conflicts Resource Monitoring: Monitors system resources (CPU, memory) for debugging Log Buffering: Maintains a circular buffer of recent logs for client display Console Output Capture: Captures all stdout/stderr output including Nova Act's thinking steps

May 1, 2025 - 02:32
 0
Implementing Nova Act MCP Server on ECS Fargate

Browser Automation as a Service

This technical blog post outlines the implementation of a Model Context Protocol (MCP) server for Amazon Nova Act on Amazon ECS Fargate platform as a container, providing browser automation capabilities as a service.

Introduction

Amazon Nova Act is an early research preview AI model and SDK designed to enable developers to build reliable web agents that can perform actions within a web browser. By integrating Nova Act with the Model Context Protocol (MCP), its browser automation capabilities can be standardized and exposed to diverse clients (e.g., AI assistants, web UIs, IDE extensions) through multiple communication channels:

  • Standard I/O (stdio) for local scripting and CLI tools like Amazon Q Developer CLI & Cline VS Code Extension (open-source)
  • Server-Sent Events (SSE) via HTTP connection for real-time, bidirectional communication
  • Streaming HTTP transport (this was recently introduced in the latest MCP specification) for scalable web-based interactions

This implementation deploys Nova Act as a containerized service on Amazon ECS fronted by ALB and uses Server-Sent Events (SSE) via HTTP connection to enable real-time communication.

Note: Amazon Nova Act is only available in the US to request for an API Key, and hence the solution is deployed on AWS in us-east-1 region to ensure compliance.

Image description

Architecture Overview

The solution consists of the following components:

  1. MCP Server: A FastAPI application that implements the Model Context Protocol and exposes Nova Act functionality by using FastAPI-MCP
  2. Streamlit Client: A web UI and MCP Client for interacting with the MCP server
  3. AWS Infrastructure: ECS Fargate tasks, ALB, and supporting resources

Nova Act MCP Architecture

Deployment Guide

Prerequisites

Before deploying the Nova Act MCP server on ECS, ensure you have:

  1. AWS CLI configured with appropriate credentials
  2. Node.js and npm installed
  3. AWS CDK installed (npm install -g aws-cdk)
  4. Docker installed and running
  5. A valid Nova Act API key

Step 1: Clone the Repository

git clone https://github.com/awsdataarchitect/nova-act-ecs.git
cd nova-act-ecs

Step 2: Export your API Key as environment variable

export NOVA_ACT_API_KEY = your-api-key-here

Step 3: Deploy the CDK Stack

npm install
cdk bootstrap
cdk deploy

Step 4: Access the Application

After deployment completes, the CDK will output the ALB DNS name. You can access:

  • MCP Server: http:///mcp
  • Streamlit UI: http://:8501

MCP Server Implementation

The server implements the MCP protocol using fastapi_mcp with the following components:

  1. MCP Endpoints (automatically handled by fastapi_mcp):

    • /mcp - SSE endpoint for event streaming
    • /mcp/schema - Schema endpoint for method discovery
    • /mcp/jsonrpc - JSON-RPC endpoint for method calls
  2. Core API Endpoint:

    • /browse - Combined endpoint for all browser automation tasks
  3. Additional Endpoints:

    • /health - Used by ALB for health checks
    • /logs - Endpoint to retrieve recent server logs

The "browse" Method

The implementation uses a simplified approach with a single "browse" method that combines browser control and instruction execution:

@app.post("/browse", operation_id="browse")
async def browse(request: BrowseRequest) -> BrowseResponse:
    """
    Execute a browsing task with Nova Act.
    This method handles browser initialization, navigation, and instruction execution.
    """
    # Implementation details excluded here for brevity...

Request Schema

The browse method accepts a flexible request schema that can handle various browsing scenarios:

class BrowseRequest(BaseModel):
    starting_url: str
    instructions: List[str] = Field(..., description="List of instructions to execute sequentially")
    max_steps_per_instruction: int = 30
    timeout_per_instruction: Optional[int] = None
    schema: Optional[Dict[str, Any]] = None
    headless: bool = True

Response Schema

The response includes detailed information about the browsing session:

class BrowseResponse(BaseModel):
    status: str
    results: List[Dict[str, Any]]
    errors: List[Dict[str, Any]] = []

Server Features

  • Single Global Browser Instance: The server maintains a single global Nova Act instance
  • Headless Mode: Browser always runs in headless mode for ECS compatibility
  • API Key Management: Retrieves API key from environment variables or AWS Secrets Manager
  • Structured Data Extraction: Supports schema-based data extraction
  • Error Handling: Comprehensive error handling and logging
  • Thread Pool Execution: Runs synchronous Nova Act code in a thread pool to avoid asyncio conflicts
  • Resource Monitoring: Monitors system resources (CPU, memory) for debugging
  • Log Buffering: Maintains a circular buffer of recent logs for client display
  • Console Output Capture: Captures all stdout/stderr output including Nova Act's thinking steps

Key Implementation Details

The server uses a thread pool to run synchronous Nova Act code without blocking the FastAPI event loop:

 # Execute the browse sequence in a thread pool
        logger.info("Running browse sequence in thread pool")
        browse_result = await asyncio.get_event_loop().run_in_executor(
            thread_pool, run_browse_sequence
        )

The server also implements a log capture mechanism to provide real-time logs to clients, including stdout/stderr interception to capture Nova Act's thinking process:

# Log buffer implementation
class LogBuffer:
    def __init__(self, max_size=1000):
        self.logs = collections.deque(maxlen=max_size)
        self.lock = threading.Lock()

    def add(self, log_entry):
        with self.lock:
            self.logs.append(log_entry)

    def get_logs(self, limit=100):
        with self.lock:
            return list(self.logs)[-limit:]

# Custom stdout/stderr interceptor to capture Nova Act outputs
class OutputInterceptor(StringIO):
    def __init__(self, log_buffer, stream_name, original_stream):
        super().__init__()
        self.log_buffer = log_buffer
        self.stream_name = stream_name
        self.original_stream = original_stream

    def write(self, text):
        # Write to the original stream
        self.original_stream.write(text)

        # Add to log buffer if not empty
        if text.strip():
            self.log_buffer.add(text.rstrip())

    def flush(self):
        self.original_stream.flush()

# Log endpoint
@app.get("/logs")
async def get_logs(limit: int = 100):
    return {"logs": log_buffer.get_logs(limit)}

For the full server implementation, see the GitHub repository.

MCP Client Implementation

The client implementation provides a Python interface to the Nova Act MCP server. I've implemented a synchronous (requests) version as its more stable in the Streamlit environment.

Client Features

  1. Connection Management:

    • Connects to the server's health endpoint to verify availability
    • Manages an HTTP session for all requests
    • Handles connection errors gracefully
  2. API Method:

    • browse(starting_url, instructions, max_steps_per_instruction, timeout_per_instruction, schema, headless) - Execute a browsing task
  3. Error Handling:

    • Proper error propagation
    • Detailed error messages
    • Connection retry logic
  4. Log Retrieval:

    • get_logs(limit) - Retrieve recent server logs

Synchronous Client Implementation

import requests
import logging
from typing import Optional, Dict, Any, List, Union

class MCPClient:
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()
        self.connected = False
        self._current_url = None

    def connect(self) -> bool:
        """Initialize connection to MCP server"""
        try:
            response = self.session.get(f"{self.base_url}/health")
            if response.status_code == 200:
                self.connected = True
                logger.info("Connected to MCP server")
                return True
            return False
        except Exception as e:
            logger.error(f"Connection error: {str(e)}")
            return False

    def browse(self, starting_url: str, instructions: Union[str, List[str]], 
              max_steps_per_instruction: int = 30, 
              timeout_per_instruction: Optional[int] = None,
              schema: Optional[Dict[str, Any]] = None,
              headless: bool = True) -> Dict[str, Any]:
        """Execute a sequence of instructions in a single browser session."""
        # Convert single instruction to list
        if isinstance(instructions, str):
            instructions = [instructions]

        if not self.connected:
            self.connect()

        try:
            data = {
                "starting_url": starting_url,
                "instructions": instructions,
                "max_steps_per_instruction": max_steps_per_instruction,
                "headless": headless
            }

            if timeout_per_instruction:
                data["timeout_per_instruction"] = timeout_per_instruction
            if schema:
                data["schema"] = schema

            logger.info(f"Sending browse request with {len(instructions)} instructions to {starting_url}")
            response = self.session.post(
                f"{self.base_url}/browse",
                json=data
            )

            if response.status_code != 200:
                raise Exception(f"Server returned {response.status_code}: {response.text}")

            result = response.json()
            logger.info(f"Browse request completed with status: {result.get('status')}")

            # Update current URL
            self._current_url = starting_url
            return result
        except Exception as e:
            raise Exception(f"Error in browse operation: {str(e)}")

    def get_logs(self, limit: int = 100) -> List[str]:
        """Get recent logs from the server"""
        if not self.connected:
            self.connect()

        try:
            response = self.session.get(
                f"{self.base_url}/logs?limit={limit}"
            )

            if response.status_code != 200:
                raise Exception(f"Server returned {response.status_code}: {response.text}")

            result = response.json()
            return result.get("logs", [])
        except Exception as e:
            logger.error(f"Error getting logs: {str(e)}")
            return []

For the full client implementation, see the GitHub repository.

Streamlit UI Implementation

The Streamlit UI provides a user-friendly interface to the Nova Act MCP server:

UI Features

  • Single Form Interface: Combines URL and instruction inputs in one form
  • Schema Builder: UI for creating extraction schemas (Boolean, Text, Product Info, List Items, Custom)
  • Execution Options: Configure max steps and timeout
  • Result Display: Formatted display of execution results and parsed responses
  • History Tracking: Maintains a record of previous operations and results
  • Live Logs Display: Shows real-time server logs in a scrollable window
  • Amazon-Specific Examples: Pre-configured examples for common Amazon shopping tasks

Live Logs Display

A key feature of the UI is the live logs display, which shows the server's output in real-time, including Nova Act's thinking process:

# In the Streamlit UI
with st.expander("Server Logs", expanded=True):
    # Add a refresh button and auto-refresh toggle
    col1, col2 = st.columns([1, 5])
    with col1:
        if st.button("