Set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API

Here's how to set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API. Build a MCP-powered AI Agent API in Your Backend with Hugging Face Have you ever wanted to plug tool-using AI agents into your backend like magic? Thanks to Hugging Face's new MCP protocol, it’s now easier than ever to do just that—in just 50 lines of code. In this post, we’ll go from installing the MCP Agent locally to wrapping it in an API using Node.js, so your backend can use it like any other service. What’s MCP? MCP (Model Context Protocol) is an emerging standard that lets LLMs access external tools like web browsers, file systems, or even your own microservices. It's model-agnostic and fully open-source. Think of it like a plug-and-play layer between LLMs and real-world functions. Hugging Face’s @huggingface/mcp-client lets you spin up an agent that speaks MCP and use tools via local MCP servers. Quickstart: Run the Agent Locally To try it out: npx @huggingface/mcp-client Or using pnpm: pnpx @huggingface/mcp-client You’ll see it connect to: a local file system server (interacts with files on your Desktop) a Playwright browser server (opens URLs using Chromium) Then it’ll ask you what to do—something like: “Write a haiku about Hugging Face and save it as hf.txt on my Desktop.” Behind the scenes, this is all powered by a loop that connects an LLM to MCP tools. It’s simple, but powerful. Installing the MCP Client in Your Backend First, add the package to your backend project: npm install @huggingface/mcp-client Or pnpm add @huggingface/mcp-client You’ll also need: Node.js 18+ A Hugging Face token (HF_TOKEN) Local MCP-compatible tool servers (these are CLI binaries that you spawn as subprocesses) Wrap It in an API (Express Example) Let’s build a basic Express server that exposes the agent as an API endpoint. // server.ts import express from "express"; import { Agent } from "@huggingface/mcp-client"; import dotenv from "dotenv"; dotenv.config(); const app = express(); app.use(express.json()); const agent = new Agent({ provider: "nebius", model: "Qwen/Qwen2.5-72B-Instruct", apiKey: process.env.HF_TOKEN!, servers: [ { command: "mcp-fs" }, { command: "mcp-playwright" }, ], }); (async () => { await agent.loadTools(); app.post("/agent", async (req, res) => { const input = req.body?.message; if (!input) return res.status(400).send("Missing input"); const result: string[] = []; for await (const output of agent.chat(input)) { if (typeof output === "string") result.push(output); } res.json({ result: result.join("") }); }); const PORT = process.env.PORT || 3000; app.listen(PORT, () => console.log(`Agent API running on http://localhost:${PORT}`)); })(); Now you can POST to http://localhost:3000/agent with a prompt like: { "message": "Search for the latest Hugging Face models and save links to Desktop." } Under the Hood: What's Happening The agent is: Loading tools from MCP servers (via mcp-fs, mcp-playwright) Sending your prompt to an LLM Letting the LLM decide what tools to call Executing those tools and feeding the result back Looping until the task is complete This is all managed by the Agent class built on top of InferenceClient and @modelcontextprotocol/sdk. Bonus: Custom Tools You can build your own MCP server (it’s just a CLI with tool metadata and input/output JSON) and plug it into the same architecture. Think internal services, databases, or even IoT control panels. Next Steps Swap in different models: e.g., mistralai/Mistral-Small-3.1-24B-Instruct Run inference on your own infrastructure Build tools for internal microservices Add authentication, rate limiting, and caching Links GitHub Repo MCP Protocol Docs OpenAI’s Function Calling Format Got questions or ideas for new tools? Drop them in the comments!

Apr 25, 2025 - 19:54

Here's how to set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API.

Build a MCP-powered AI Agent API in Your Backend with Hugging Face

Have you ever wanted to plug tool-using AI agents into your backend like magic? Thanks to Hugging Face's new MCP protocol, it’s now easier than ever to do just that—in just 50 lines of code.

In this post, we’ll go from installing the MCP Agent locally to wrapping it in an API using Node.js, so your backend can use it like any other service.

What’s MCP?

MCP (Model Context Protocol) is an emerging standard that lets LLMs access external tools like web browsers, file systems, or even your own microservices. It's model-agnostic and fully open-source.

Think of it like a plug-and-play layer between LLMs and real-world functions. Hugging Face’s @huggingface/mcp-client lets you spin up an agent that speaks MCP and use tools via local MCP servers.

Quickstart: Run the Agent Locally

To try it out:

npx @huggingface/mcp-client

Or using pnpm:

pnpx @huggingface/mcp-client

You’ll see it connect to:

a local file system server (interacts with files on your Desktop)
a Playwright browser server (opens URLs using Chromium)

Then it’ll ask you what to do—something like:

“Write a haiku about Hugging Face and save it as hf.txt on my Desktop.”

Behind the scenes, this is all powered by a loop that connects an LLM to MCP tools. It’s simple, but powerful.

Installing the MCP Client in Your Backend

First, add the package to your backend project:

npm install @huggingface/mcp-client

pnpm add @huggingface/mcp-client

You’ll also need:

Node.js 18+
A Hugging Face token (HF_TOKEN)
Local MCP-compatible tool servers (these are CLI binaries that you spawn as subprocesses)

Wrap It in an API (Express Example)

Let’s build a basic Express server that exposes the agent as an API endpoint.

// server.ts
import express from "express";
import { Agent } from "@huggingface/mcp-client";
import dotenv from "dotenv";

dotenv.config();

const app = express();
app.use(express.json());

const agent = new Agent({
  provider: "nebius",
  model: "Qwen/Qwen2.5-72B-Instruct",
  apiKey: process.env.HF_TOKEN!,
  servers: [
    { command: "mcp-fs" },
    { command: "mcp-playwright" },
  ],
});

(async () => {
  await agent.loadTools();

  app.post("/agent", async (req, res) => {
    const input = req.body?.message;
    if (!input) return res.status(400).send("Missing input");

    const result: string[] = [];
    for await (const output of agent.chat(input)) {
      if (typeof output === "string") result.push(output);
    }

    res.json({ result: result.join("") });
  });

  const PORT = process.env.PORT || 3000;
  app.listen(PORT, () => console.log(`Agent API running on http://localhost:${PORT}`));
})();

Now you can POST to http://localhost:3000/agent with a prompt like:

{
  "message": "Search for the latest Hugging Face models and save links to Desktop."
}

Under the Hood: What's Happening

The agent is:

Loading tools from MCP servers (via mcp-fs, mcp-playwright)
Sending your prompt to an LLM
Letting the LLM decide what tools to call
Executing those tools and feeding the result back
Looping until the task is complete

This is all managed by the Agent class built on top of InferenceClient and @modelcontextprotocol/sdk.

Bonus: Custom Tools

You can build your own MCP server (it’s just a CLI with tool metadata and input/output JSON) and plug it into the same architecture. Think internal services, databases, or even IoT control panels.

Next Steps

Swap in different models: e.g., mistralai/Mistral-Small-3.1-24B-Instruct
Run inference on your own infrastructure
Build tools for internal microservices
Add authentication, rate limiting, and caching

Links

Got questions or ideas for new tools? Drop them in the comments!