Drip · Agents & RAG · 13 min read

Multi-MCP Architecture

MCP grew from a Claude-only feature into a Linux Foundation standard in 18 months. The patterns that emerged around composing multiple servers are different from what the spec anticipated — and the hidden cost of connecting everything is real.

The bottom line. As of April 2026, MCP is a Linux Foundation standard with 97M downloads, 13K+ public servers, and 10K+ remote endpoints registered. The architectural patterns that emerged: many small servers per domain beat one big monolithic server (Particula Tech, April 2026); remote MCP via Streamable HTTP + OAuth 2.1 is the platform shape, not local stdio; and connecting every server you have costs you context tokens and planner accuracy — 10 servers can mean ~8K tokens of schemas loaded before your prompt even starts. The fix is lazy-loading by task class. The lab below lets you watch the token ledger move.

§ 00 · FROM LOCAL PROTOTYPE TO A STANDARD18 months from Claude-only to Linux Foundation

Anthropic introduced MCP in late 2024 as a way for Claude Desktop to talk to local tools through a JSON-RPC server. That’s the same shape almost every text-editor plugin had been using for years, lifted into the LLM era. It worked, and it spread fast.

By April 2026 the spec had moved out of Anthropic’s repository and into the Linux Foundation, with 97 million downloads of MCP server packages and over 13,000 public servers in the registry. More important: 10,000+ remote endpoints were registered, meaning the protocol had outgrown its local-only prototype phase. Most production MCP usage is now remote, not stdio.

Two architectural questions matter for any team building on top of this. How do you carve up your servers? And how do you avoid drowning your agent in their combined tool inventory? The rest of this drip is the answers production teams converged on.

§ 01 · MANY SMALL SERVERS BEAT ONE BIG ONEMicroservices for agents

The first design instinct most teams have is to build one MCP server that exposes every tool the agent might need. CRM, billing, inventory, GitHub, Slack — all one server, 50 tools. It works in a demo. It doesn’t work in production.

Particula Tech’s April 2026 production patterns piece documented the failure mode: agents get measurably worse at routing decisions as the tool inventory grows past ~15-20 distinct tools. The model has to triage which tool fits the current intent; the cost of that triage scales superlinearly with the inventory. Even Sonnet-class models start making sloppy calls — picking the wrong tool, mis-ordering arguments, invoking unrelated tools “just in case.”

The fix is microservices for agents. One server per business domain. Each independently deployable. The orchestrator composes them at the host. Concretely:

  1. Draw the domains.Model your business the way you’d model microservices — by who owns what, what changes together, what fails together.
  2. One domain, one server. CRM is its own server; billing is its own server; inventory is its own server. Each exposes 4-8 tools tightly scoped to its domain.
  3. Compose at the host. The orchestrator (Claude Code, Claude Desktop, your own agent harness) connects to multiple servers and routes calls based on intent.
  4. Deploy independently. Each server has its own release cycle, its own deploy schedule, its own on-call rotation. Failure isolation is the prize.

The benefits read like the standard microservices argument applied at a new layer — and that’s the right frame. MCP servers are the microservice abstraction for AI tooling. Treat them that way.

§ 02 · REMOTE MCP — STREAMABLE HTTP + OAUTHStdio was the prototype. HTTP is the platform.

The April 2026 MCP spec normalizes two transports: stdio (the local original) and Streamable HTTP (the platform shape). Stdio was the right primitive for the demo era — your IDE spawns a subprocess, talks to it over stdin/stdout, kills it on shutdown. Streamable HTTP is the right primitive for production — your server runs in the cloud, the client speaks HTTPS + Server-Sent Events to it, and OAuth 2.1 with PKCE handles auth.

The migration is small in code, big in implications. Four moves:

  1. Switch to HTTP transport. The SDK provides a single drop-in change from StdioServerTransport to StreamableHTTPServerTransport. The handler code is identical.
  2. Add OAuth 2.1 with PKCE. Validate the bearer token on every request, against the right issuer and audience. RFC 8707 audience binding is the part most teams miss; without it, a token a user obtained for some other MCP server can be replayed against yours.
  3. Ship to the edge. Cloudflare Workers, Vercel Functions, Supabase Edge Functions — anywhere with low cold- start latency and global presence. The model client wants the server fast and close.
  4. Register with the client.Add the URL to the client’s MCP config, OAuth flow handles the rest.

Once the server is remote, the “works on my laptop” discount goes away. Authentication is real. Rate limiting is real. Per-user data isolation is real. RLS-shaped policies in the database become the actual security boundary, with the OAuth-extracted auth.uid()as the discriminator. The work is the work — but the payoff is that your MCP server runs in the same place your team’s other production services do.

§ 03 · THE HIDDEN COST OF MULTI-MCPMore MCPs. Worse agents.

Here’s the counterintuitive result. Once you have the many-small server pattern in place and remote MCP working, the next failure mode is the one nobody warned you about: connecting more servers makes the agent worse.

Each MCP server wires its tool descriptions into every model call. Ten servers averaging ~800 tokens of schemas each means roughly 8,000 tokens of tool inventory loaded before your prompt even starts. The model has less budget for the actual task. And worse — even with a long context window, large tool inventories degrade planner accuracy regardless of whether you have token budget left.

Four things to put in place:

§ 04 · LAZY-LOAD BY TASK CLASSClassify first. Load second. Trace always.

The lab below makes the dynamics concrete. Toggle ten servers on or off, pick the current task class (support / data-ops / billing / code-review), and switch lazy-load on. The token ledger updates, the task budget moves, and the synthetic planner accuracy responds to tool count.

Lab · MCP token ledgerToggle servers, pick the task, switch on lazy-load — see what each MCP costs you per call
Current task:
Tool schemas loaded
9.0K
from 10 servers
Task budget left
179K
of 200K window
Planner accuracy
84%
on routing decisions

Lazy-load is off. Every connected server's tool descriptions load into the context regardless of whether the current task needs them. Planner accuracy degrades with tool count, even before token budget runs out.

Two things to feel in the lab. Without lazy-load, every connected server costs you regardless of relevance — the warehouse server is loaded even for a support ticket where nothing in it matters. With lazy-load on, the same set of servers can stay connected without paying their full cost on every turn; the agent only sees the slice it needs.

Implementation-wise, the classifier doesn’t have to be sophisticated. A cheap Haiku-class call (“classify this request into one of: support / data-ops / billing / code-review”) hits 90%+ accuracy on real-world traffic and costs <$0.001 per request. The token savings on the main model call dwarf the classifier cost.

§ 05 · A REFERENCE TOPOLOGYHow a production multi-MCP stack looks

Rendering diagram…
Classifier → host → lazy-loaded MCP servers

Three things to take away from the topology. The classifier runs before the planner; the planner runs after the relevant servers are loaded. The host is doing real work — it knows about your task taxonomy, your server registry, and the mapping between them. The MCP servers themselves are dumb — they expose tools, validate OAuth, and execute. Smartness lives one layer up.

CHECKYour agent has 8 MCP servers connected. Average user request only needs tools from 2 of them. Without changing anything else, which intervention has the highest expected impact on production quality?

§ · FURTHER READINGReferences & deeper sources

  1. Particula Tech (2026). Many-small MCP Servers Beat One Big One — Production Patterns · Particula Engineering
  2. Linux Foundation MCP Working Group (2026). Model Context Protocol — April 2026 Specification · Linux Foundation
  3. Anthropic (2024). Introducing the Model Context Protocol · Anthropic Engineering
  4. Brain Drip Editors (2026). Blueprint: Build a Shared-Skills MCP Server (Supabase + OAuth 2.1) · Brain Drip Blueprints
  5. Cloudflare (2026). Deploying Remote MCP Servers at the Edge · Cloudflare Blog

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.