Multi-MCP Architecture
MCP grew from a Claude-only feature into a Linux Foundation standard in 18 months. The patterns that emerged around composing multiple servers are different from what the spec anticipated — and the hidden cost of connecting everything is real.
§ 00 · FROM LOCAL PROTOTYPE TO A STANDARD18 months from Claude-only to Linux Foundation
Anthropic introduced MCP in late 2024 as a way for Claude Desktop to talk to local tools through a JSON-RPC server. That’s the same shape almost every text-editor plugin had been using for years, lifted into the LLM era. It worked, and it spread fast.
By April 2026 the spec had moved out of Anthropic’s repository and into the Linux Foundation, with 97 million downloads of MCP server packages and over 13,000 public servers in the registry. More important: 10,000+ remote endpoints were registered, meaning the protocol had outgrown its local-only prototype phase. Most production MCP usage is now remote, not stdio.
Two architectural questions matter for any team building on top of this. How do you carve up your servers? And how do you avoid drowning your agent in their combined tool inventory? The rest of this drip is the answers production teams converged on.
§ 01 · MANY SMALL SERVERS BEAT ONE BIG ONEMicroservices for agents
The first design instinct most teams have is to build one MCP server that exposes every tool the agent might need. CRM, billing, inventory, GitHub, Slack — all one server, 50 tools. It works in a demo. It doesn’t work in production.
Particula Tech’s April 2026 production patterns piece documented the failure mode: agents get measurably worse at routing decisions as the tool inventory grows past ~15-20 distinct tools. The model has to triage which tool fits the current intent; the cost of that triage scales superlinearly with the inventory. Even Sonnet-class models start making sloppy calls — picking the wrong tool, mis-ordering arguments, invoking unrelated tools “just in case.”
The fix is microservices for agents. One server per business domain. Each independently deployable. The orchestrator composes them at the host. Concretely:
- Draw the domains.Model your business the way you’d model microservices — by who owns what, what changes together, what fails together.
- One domain, one server. CRM is its own server; billing is its own server; inventory is its own server. Each exposes 4-8 tools tightly scoped to its domain.
- Compose at the host. The orchestrator (Claude Code, Claude Desktop, your own agent harness) connects to multiple servers and routes calls based on intent.
- Deploy independently. Each server has its own release cycle, its own deploy schedule, its own on-call rotation. Failure isolation is the prize.
The benefits read like the standard microservices argument applied at a new layer — and that’s the right frame. MCP servers are the microservice abstraction for AI tooling. Treat them that way.
§ 02 · REMOTE MCP — STREAMABLE HTTP + OAUTHStdio was the prototype. HTTP is the platform.
The April 2026 MCP spec normalizes two transports: stdio (the local original) and Streamable HTTP (the platform shape). Stdio was the right primitive for the demo era — your IDE spawns a subprocess, talks to it over stdin/stdout, kills it on shutdown. Streamable HTTP is the right primitive for production — your server runs in the cloud, the client speaks HTTPS + Server-Sent Events to it, and OAuth 2.1 with PKCE handles auth.
The migration is small in code, big in implications. Four moves:
- Switch to HTTP transport. The SDK provides a single drop-in change from
StdioServerTransporttoStreamableHTTPServerTransport. The handler code is identical. - Add OAuth 2.1 with PKCE. Validate the bearer token on every request, against the right issuer and audience. RFC 8707 audience binding is the part most teams miss; without it, a token a user obtained for some other MCP server can be replayed against yours.
- Ship to the edge. Cloudflare Workers, Vercel Functions, Supabase Edge Functions — anywhere with low cold- start latency and global presence. The model client wants the server fast and close.
- Register with the client.Add the URL to the client’s MCP config, OAuth flow handles the rest.
Once the server is remote, the “works on my laptop” discount goes away. Authentication is real. Rate limiting is real. Per-user data isolation is real. RLS-shaped policies in the database become the actual security boundary, with the OAuth-extracted auth.uid()as the discriminator. The work is the work — but the payoff is that your MCP server runs in the same place your team’s other production services do.
§ 03 · THE HIDDEN COST OF MULTI-MCPMore MCPs. Worse agents.
Here’s the counterintuitive result. Once you have the many-small server pattern in place and remote MCP working, the next failure mode is the one nobody warned you about: connecting more servers makes the agent worse.
Each MCP server wires its tool descriptions into every model call. Ten servers averaging ~800 tokens of schemas each means roughly 8,000 tokens of tool inventory loaded before your prompt even starts. The model has less budget for the actual task. And worse — even with a long context window, large tool inventories degrade planner accuracy regardless of whether you have token budget left.
Four things to put in place:
- The token ledger.Count what each MCP costs you per call. Most clients don’t surface this number; measure it yourself by diffing call sizes with vs without each server connected.
- Understand why it tanks accuracy. The planner has to triage tools by intent on every turn. A 50- tool inventory is a 50-element ranking problem. Even with tool descriptions perfectly written, the planner spends attention on triage that should go to the actual task.
- Lazy-load servers based on task classification. Classify each incoming request into a task class; load only the servers relevant to that class. A support ticket doesn’t need warehouse SQL access; loading the warehouse server’s schemas only hurts.
- Trace per server. Track which servers each task class actually invokes. Drop the dead weight monthly — servers connected but never called are pure overhead.
§ 04 · LAZY-LOAD BY TASK CLASSClassify first. Load second. Trace always.
The lab below makes the dynamics concrete. Toggle ten servers on or off, pick the current task class (support / data-ops / billing / code-review), and switch lazy-load on. The token ledger updates, the task budget moves, and the synthetic planner accuracy responds to tool count.
Lazy-load is off. Every connected server's tool descriptions load into the context regardless of whether the current task needs them. Planner accuracy degrades with tool count, even before token budget runs out.
Two things to feel in the lab. Without lazy-load, every connected server costs you regardless of relevance — the warehouse server is loaded even for a support ticket where nothing in it matters. With lazy-load on, the same set of servers can stay connected without paying their full cost on every turn; the agent only sees the slice it needs.
Implementation-wise, the classifier doesn’t have to be sophisticated. A cheap Haiku-class call (“classify this request into one of: support / data-ops / billing / code-review”) hits 90%+ accuracy on real-world traffic and costs <$0.001 per request. The token savings on the main model call dwarf the classifier cost.
§ 05 · A REFERENCE TOPOLOGYHow a production multi-MCP stack looks
Three things to take away from the topology. The classifier runs before the planner; the planner runs after the relevant servers are loaded. The host is doing real work — it knows about your task taxonomy, your server registry, and the mapping between them. The MCP servers themselves are dumb — they expose tools, validate OAuth, and execute. Smartness lives one layer up.
§ · FURTHER READINGReferences & deeper sources
- (2026). Many-small MCP Servers Beat One Big One — Production Patterns · Particula Engineering
- (2026). Model Context Protocol — April 2026 Specification · Linux Foundation
- (2024). Introducing the Model Context Protocol · Anthropic Engineering
- (2026). Blueprint: Build a Shared-Skills MCP Server (Supabase + OAuth 2.1) · Brain Drip Blueprints
- (2026). Deploying Remote MCP Servers at the Edge · Cloudflare Blog
Original figures live in the linked sources — open the papers for the canonical visuals in their full context.