Set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API
Here's how to set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API. Build a MCP-powered AI Agent API in Your Backend with Hugging Face Have you ever wanted to plug tool-using AI agents into your backend like magic? Thanks to Hugging Face's new MCP protocol, it’s now easier than ever to do just that—in just 50 lines of code. In this post, we’ll go from installing the MCP Agent locally to wrapping it in an API using Node.js, so your backend can use it like any other service. What’s MCP? MCP (Model Context Protocol) is an emerging standard that lets LLMs access external tools like web browsers, file systems, or even your own microservices. It's model-agnostic and fully open-source. Think of it like a plug-and-play layer between LLMs and real-world functions. Hugging Face’s @huggingface/mcp-client lets you spin up an agent that speaks MCP and use tools via local MCP servers. Quickstart: Run the Agent Locally To try it out: npx @huggingface/mcp-client Or using pnpm: pnpx @huggingface/mcp-client You’ll see it connect to: a local file system server (interacts with files on your Desktop) a Playwright browser server (opens URLs using Chromium) Then it’ll ask you what to do—something like: “Write a haiku about Hugging Face and save it as hf.txt on my Desktop.” Behind the scenes, this is all powered by a loop that connects an LLM to MCP tools. It’s simple, but powerful. Installing the MCP Client in Your Backend First, add the package to your backend project: npm install @huggingface/mcp-client Or pnpm add @huggingface/mcp-client You’ll also need: Node.js 18+ A Hugging Face token (HF_TOKEN) Local MCP-compatible tool servers (these are CLI binaries that you spawn as subprocesses) Wrap It in an API (Express Example) Let’s build a basic Express server that exposes the agent as an API endpoint. // server.ts import express from "express"; import { Agent } from "@huggingface/mcp-client"; import dotenv from "dotenv"; dotenv.config(); const app = express(); app.use(express.json()); const agent = new Agent({ provider: "nebius", model: "Qwen/Qwen2.5-72B-Instruct", apiKey: process.env.HF_TOKEN!, servers: [ { command: "mcp-fs" }, { command: "mcp-playwright" }, ], }); (async () => { await agent.loadTools(); app.post("/agent", async (req, res) => { const input = req.body?.message; if (!input) return res.status(400).send("Missing input"); const result: string[] = []; for await (const output of agent.chat(input)) { if (typeof output === "string") result.push(output); } res.json({ result: result.join("") }); }); const PORT = process.env.PORT || 3000; app.listen(PORT, () => console.log(`Agent API running on http://localhost:${PORT}`)); })(); Now you can POST to http://localhost:3000/agent with a prompt like: { "message": "Search for the latest Hugging Face models and save links to Desktop." } Under the Hood: What's Happening The agent is: Loading tools from MCP servers (via mcp-fs, mcp-playwright) Sending your prompt to an LLM Letting the LLM decide what tools to call Executing those tools and feeding the result back Looping until the task is complete This is all managed by the Agent class built on top of InferenceClient and @modelcontextprotocol/sdk. Bonus: Custom Tools You can build your own MCP server (it’s just a CLI with tool metadata and input/output JSON) and plug it into the same architecture. Think internal services, databases, or even IoT control panels. Next Steps Swap in different models: e.g., mistralai/Mistral-Small-3.1-24B-Instruct Run inference on your own infrastructure Build tools for internal microservices Add authentication, rate limiting, and caching Links GitHub Repo MCP Protocol Docs OpenAI’s Function Calling Format Got questions or ideas for new tools? Drop them in the comments!

Here's how to set up and integrate Hugging Face's MCP-powered Tiny Agent into your backend server as an API.
Build a MCP-powered AI Agent API in Your Backend with Hugging Face
Have you ever wanted to plug tool-using AI agents into your backend like magic? Thanks to Hugging Face's new MCP protocol, it’s now easier than ever to do just that—in just 50 lines of code.
In this post, we’ll go from installing the MCP Agent locally to wrapping it in an API using Node.js, so your backend can use it like any other service.
What’s MCP?
MCP (Model Context Protocol) is an emerging standard that lets LLMs access external tools like web browsers, file systems, or even your own microservices. It's model-agnostic and fully open-source.
Think of it like a plug-and-play layer between LLMs and real-world functions. Hugging Face’s @huggingface/mcp-client
lets you spin up an agent that speaks MCP and use tools via local MCP servers.
Quickstart: Run the Agent Locally
To try it out:
npx @huggingface/mcp-client
Or using pnpm
:
pnpx @huggingface/mcp-client
You’ll see it connect to:
- a local file system server (interacts with files on your Desktop)
- a Playwright browser server (opens URLs using Chromium)
Then it’ll ask you what to do—something like:
“Write a haiku about Hugging Face and save it as
hf.txt
on my Desktop.”
Behind the scenes, this is all powered by a loop that connects an LLM to MCP tools. It’s simple, but powerful.
Installing the MCP Client in Your Backend
First, add the package to your backend project:
npm install @huggingface/mcp-client
Or
pnpm add @huggingface/mcp-client
You’ll also need:
- Node.js 18+
- A Hugging Face token (
HF_TOKEN
) - Local MCP-compatible tool servers (these are CLI binaries that you spawn as subprocesses)
Wrap It in an API (Express Example)
Let’s build a basic Express server that exposes the agent as an API endpoint.
// server.ts
import express from "express";
import { Agent } from "@huggingface/mcp-client";
import dotenv from "dotenv";
dotenv.config();
const app = express();
app.use(express.json());
const agent = new Agent({
provider: "nebius",
model: "Qwen/Qwen2.5-72B-Instruct",
apiKey: process.env.HF_TOKEN!,
servers: [
{ command: "mcp-fs" },
{ command: "mcp-playwright" },
],
});
(async () => {
await agent.loadTools();
app.post("/agent", async (req, res) => {
const input = req.body?.message;
if (!input) return res.status(400).send("Missing input");
const result: string[] = [];
for await (const output of agent.chat(input)) {
if (typeof output === "string") result.push(output);
}
res.json({ result: result.join("") });
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => console.log(`Agent API running on http://localhost:${PORT}`));
})();
Now you can POST to http://localhost:3000/agent
with a prompt like:
{
"message": "Search for the latest Hugging Face models and save links to Desktop."
}
Under the Hood: What's Happening
The agent is:
- Loading tools from MCP servers (via
mcp-fs
,mcp-playwright
) - Sending your prompt to an LLM
- Letting the LLM decide what tools to call
- Executing those tools and feeding the result back
- Looping until the task is complete
This is all managed by the Agent
class built on top of InferenceClient
and @modelcontextprotocol/sdk
.
Bonus: Custom Tools
You can build your own MCP server (it’s just a CLI with tool metadata and input/output JSON) and plug it into the same architecture. Think internal services, databases, or even IoT control panels.
Next Steps
- Swap in different models: e.g.,
mistralai/Mistral-Small-3.1-24B-Instruct
- Run inference on your own infrastructure
- Build tools for internal microservices
- Add authentication, rate limiting, and caching
Links
Got questions or ideas for new tools? Drop them in the comments!