The Enterprise MCP Pattern: Proxy, Aggregate, Host

There are 58 public MCP servers in the awslabs/mcp repository — each one a focused interface to a cloud service: CloudWatch, ECS, CDK, DynamoDB, IAM, cost analysis, documentation search, and 51 more. They’re open source, maintained, and wire directly into any MCP-compatible agent via stdio.

The naive approach is to add all of them to your config file and let them load. This works on a personal machine for a subset of servers. It doesn’t work for a team, and it doesn’t scale: 50+ servers loading at session start burns hundreds of thousands of tokens before anyone types a question. The enterprise pattern is different — proxy, aggregate, host.

This post covers all three steps.

Step 1: The Proxy

The progressive disclosure proxy is the right local architecture for a large tool set. Instead of loading all tool schemas into the agent’s context at session start, the proxy:

Spawns each server as a subprocess at startup
Collects all tool schemas via the initialize / tools/list handshake
Builds a BM25 search index over all tool names, descriptions, and parameter descriptions
Exposes two meta-tools to the agent: discover_tools(query) and call_tool(name, arguments)
Loads a small always-on set of high-frequency tools unconditionally

Session startup goes from ~200k tokens to ~2k. The full catalog is available on demand. The agent calls discover_tools("check ECS service health"), gets back the 3–5 most relevant tool schemas, and executes from there.

For the awslabs catalog specifically, the proxy aggregates several hundred tools across 58 servers into a single discoverable surface. The agent doesn’t know or care which server owns which tool — it calls call_tool("get_log_events", {...}) and the proxy routes to CloudWatch.

What the Catalog Covers

The 58 servers break into five categories:

Infrastructure and compute: ECS, EKS, Lambda, Step Functions, Serverless, IAM, networking, SAP management, container tooling (Finch), IaC

Data and storage: DynamoDB, PostgreSQL, MySQL, Redshift, OpenSearch, DocumentDB, ElastiCache, Keyspaces (Cassandra), Neptune (graph), Aurora DSQL, S3 Tables, Memcached, Valkey, Timestream for InfluxDB, Prometheus

Operations and observability: CloudWatch, CloudWatch Application Signals, CloudTrail, cost and billing, full AWS API wrapper, AWS Support, Well-Architected security review

AI/ML services: Bedrock KB retrieval, AgentCore, SageMaker (including Unified Studio Spark tooling), Translate, SNS/SQS, Q Business, Q Index, knowledge management, document loading, health imaging, health lake

Developer utilities: AWS documentation search, CDK, OpenAPI server generation, AppSync, pricing, location services, IoT SiteWise, data processing, Health Omics, Transform

The documentation server alone changes how agents interact with service documentation — search_documentation and read_documentation_page replace browser round-trips. The CDK server gives agents access to construct APIs and code samples during infrastructure work. The cost and billing server turns billing questions into structured queries.

The aws-api-mcp-server is the catch-all: it wraps the full AWS CLI API surface for services that don’t have a dedicated server yet. Broad coverage, less ergonomic — keep it behind discovery rather than the always-on set.

The catalog is also still growing. The count was around 25 six months ago. At 58 today, with the pattern of additions, it’ll likely hit 80+ by end of year. Any architecture that requires adding each server manually to developer configs becomes a maintenance problem fast.

Step 2: The Gateway

Running 50+ subprocesses on a developer’s laptop is fine for individuals who selectively use a subset. For a team, it’s the wrong architecture for three reasons:

Setup friction — every developer needs npx, the right Node version, AWS credentials configured, all servers installed. Non-trivial onboarding surface, and it grows with every server added to the catalog.
Credential management — each developer needs cloud credentials that reach the services. In an enterprise, that means IAM roles, permission boundaries, and audit trails for dozens of tool surfaces across N developers.
Reproducibility — tool behavior depends on the installed version of each server and the local Python/Node environment. A team of 10 is running 10 slightly different setups.

The fix is to host the proxy as an HTTP MCP server behind a gateway.

Developers
    ↕ (HTTP/SSE)
Gateway (auth, routing)
    ↕ (internal HTTP)
Hosted Proxy
    ├── cloudwatch-mcp-server (subprocess)
    ├── cdk-mcp-server (subprocess)
    ├── ecs-mcp-server (subprocess)
    └── ... 55 more

The hosted proxy runs once, in a container, with a shared cloud identity. Developers connect over HTTP with JWT auth. They get the full catalog without any local setup. The proxy container’s IAM role controls what the tools can do — one policy, auditable, managed centrally.

The Auth Pattern

The gateway sits between developers and the hosted proxy. The auth pattern that makes this work in an enterprise without static keys:

Cognito User Pool (or your IdP — Okta, Azure AD, same OIDC protocol) authenticates the developer
PKCE browser login for interactive sessions — no client secret, works with any OIDC provider
STS AssumeRoleWithWebIdentity exchanges the IdToken for temporary AWS credentials
JWT from Cognito authenticates to the gateway (CUSTOM_JWT auth on the gateway validates it)
The hosted proxy’s IAM role (not developer credentials) makes the actual cloud API calls

Zero static keys anywhere in the chain. The developer proves identity via IdP; the proxy’s service role does the cloud work. Rotate or revoke the proxy’s IAM role and every connected session is affected immediately.

This pattern is portable: swap Cognito for any OIDC provider. The STS federation step is standard. The gateway JWT validation is standard. The tools themselves are AWS-specific, but the auth architecture is not.

What the Developer Experience Looks Like

After the gateway is deployed and the developer is provisioned in the IdP:

Run the session launcher — it opens a browser tab for PKCE login, exchanges for temp creds, starts the agent with the gateway endpoint configured as an MCP server
The agent connects; session startup costs ~2k tokens (the two meta-tools + always-on set)
Ask it to check service health, query costs, search CDK docs, inspect a database schema — it discovers and calls the right tools

The developer never configures a server. Never manages credentials for dozens of tools. Never installs npx packages. The proxy and gateway are someone else’s problem.

What the Operator Experience Looks Like

Deploying and maintaining this:

One container running the proxy — update it to add servers, change the always-on set, tune the search index
One IAM role for the proxy — adjust permissions in one place to control what all connected developers can do
One gateway — add developer users in the IdP, no per-developer AWS config
CloudWatch for the proxy container gives you tool call volume, latency, and error rates across the whole team

The gap in a conventional per-developer setup is that there’s no operational visibility: you don’t know how often agents use the documentation server versus the CDK server, you can’t throttle expensive operations, and you can’t rotate credentials without touching every developer machine. The hosted proxy closes all three.

The Incremental Path

You don’t deploy a 58-server gateway in one shot. The practical sequence:

Build the proxy locally with the servers you actually use
Confirm the discovery and routing work correctly — verify that discover_tools("your common tasks") returns the right tools
Containerize the proxy
Deploy to a compute layer of your choice — any container host works
Wire a gateway in front of it with JWT auth
Ship the session launcher to developers

Steps 1–2 work standalone with no gateway. Steps 3–5 are one-time infrastructure. Step 6 is the distribution mechanism.

The fully hosted version is a significant increment over local setup, but it’s not a prerequisite for getting value from the catalog. Start with the proxy. The gateway comes later.

The awslabs catalog has grown fast enough — 25 to 58 servers in roughly six months — that it’s now a serious tool surface. The pattern for making it usable at scale is straightforward. What’s missing for most teams isn’t the tooling; it’s the decision to build the infrastructure around it.

That decision is worth making.