Build Along · Module 36·6 min build

Build: A Multi-MCP Router

The companion build to Multi-MCP Architecture. Eighty lines, no dependencies: three small servers, the tool-bloat problem they create together, and the router that fixes it. Route first, expose second.

What you’ll build

  • Three tiny “MCP servers”, each exposing two tools.
  • A router that picks only the relevant server(s) for a request.
  • A demo that contrasts “expose everything” (6 tools) with “route first” (2 tools).
  • The lesson: tool inventory grows linearly with every server, and the model reads all of it on every turn.

§ 00 · THE PROBLEMMany small servers, one drowning agent

The MCP advice is right: build many small, focused servers instead of one monolith. But there’s a catch nobody mentions until they hit it — connect them all and the agent has to read every tool’s description on every turn.

Tool bloattool bloat. The context and accuracy cost of exposing too many tools to an agent at once. Each tool's name and description consume tokens, and a long tool list measurably degrades the model's ability to pick the right one. is the real ceiling on multi-server setups. Three servers is six tools — fine. Thirty servers is sixty-plus tools, and selection accuracy falls off a cliff.

§ 01 · THREE TINY SERVERSA server is just a named bundle of tools

We’re not implementing the wire protocol — we’re isolating the architecture lesson. Each server exposes its tool schemas (the part the model actually sees) and can call a tool by name.

class Server:
    def __init__(self, name, tools):
        self.name = name
        self.tools = tools  # [{name, description, fn}]

    def tool_schemas(self):
        # what the agent sees for each tool — this is the context cost
        return [
            {"name": f"{self.name}.{t['name']}", "description": t["description"]}
            for t in self.tools
        ]

    def call(self, tool_name, **kwargs):
        for t in self.tools:
            if t["name"] == tool_name:
                return t["fn"](**kwargs)
        return f"Error: {self.name} has no tool '{tool_name}'"

weather  = Server("weather",  [...])   # current, forecast
docs     = Server("docs",     [...])   # search, get
database = Server("database", [...])   # query, schema

§ 02 · THE ROUTERDecide which servers are even relevant

The fix is to pick servers beforeexposing tools. Here the routing is keyword-based for clarity; in production you’d use a small classifier or a cheap model call. The shape is identical.

def route(request, servers):
    """Return only the servers relevant to this request."""
    request_l = request.lower()
    picked = [
        s for s in servers
        if any(kw in request_l for kw in KEYWORDS.get(s.name, []))
    ]
    return picked or servers  # fall back to all if nothing matched

def inventory(servers):
    return [schema for s in servers for schema in s.tool_schemas()]

§ 03 · RUN ITSix tools vs. two

$ python demo.py
User request: What's the weather forecast for Toronto?

WITHOUT routing (connect everything):
  tools in inventory: 6
WITH routing (scope to relevant servers):
  tools in inventory: 2
Result:
  Toronto: 18C, 20C, 17C

At three servers the win is small. The point is that it scales: every server you add is more inventory the model reads forever, unless a router stands in front of it.

CHECKYour agent connects to 25 MCP servers and tool-selection accuracy has dropped. What's the highest-leverage fix?

§ · FURTHER READINGReferences & deeper sources

  1. Anthropic (2024). Model Context Protocol — Specification · modelcontextprotocol.io
  2. Anthropic (2024). Introducing the Model Context Protocol · Anthropic
  3. MCP (2025). Example Servers · modelcontextprotocol.io

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.