Having multiple MCP servers running Eats into Context Window
Environment
-
Platform (select one):
- [ ] Anthropic API
- [ ] AWS Bedrock
- [ ] Google Vertex AI
- [x] Other: Claude Code (local CLI with multiple MCP servers)
-
Claude CLI version: <!-- Replace with actual output of
claude --version--> -
Operating System: Ubuntu 22.04 (inside WSL)
-
Terminal: VS Code integrated terminal
Bug Description
When running multiple Claude MCP servers simultaneously (e.g., ~20 local MCP server processes), the context window in Claude Code rapidly depletes. Context usage starts between ~8%–18% but quickly consumes the entire available context after only ~5 prompts, making the session unusable. This occurs even if the prompts themselves are short.
Steps to Reproduce
- Start ~20 local MCP server instances connected to Claude Code.
- Open a Claude Code session in your terminal or VS Code.
- Begin sending prompts as normal.
- Observe the context percentage in Claude Code; note how it increases rapidly with each prompt, even if prompt size is small.
- After ~5 prompts, context window reaches 100% consumed, forcing session reset.
Expected Behavior
The context window should remain stable or grow proportionally to the actual prompt/response size, not inflate excessively due to multiple MCP servers running. Running multiple MCP processes should not artificially bloat context usage.
Actual Behavior
Context window usage grows dramatically and unexpectedly when many MCP servers are active. This results in the session maxing out context capacity far sooner than expected, requiring frequent resets to continue using Claude Code effectively.
Additional Context
- Behavior confirmed when running ~20 MCP servers locally.
- Issue not observed with only a few MCP servers (<5) running concurrently.
I've created a working proof-of-concept that addresses this exact issue:
🔗 Repository: https://github.com/machjesusmoto/claude-lazy-loading 📝 Full discussion: #7336
Results achieved:
- 95% token reduction (108k → 5k tokens)
- Lightweight registry approach (~500 tokens)
- Intelligent keyword-based loading
- Working code you can test
While it requires Claude Code native support for true lazy loading, it demonstrates the solution and provides the blueprint for implementation.
The approach could reduce your MCP context consumption from eating into your window to just 2.5% overhead.
It's worth noting that MCP Toggle functionality has been added in Claude Code 2.0.10:
2.0.10
- Rewrote terminal renderer for buttery smooth UI
- Enable/disable MCP servers by @mentioning, or in /mcp
- Added tab completion for shell commands in bash mode
- PreToolUse hooks can now modify tool inputs
- Press Ctrl-G to edit your prompt in your system's configured text editor
- Fixes for bash permission checks with environment variables in the command
@lukemmtt thanks for posting this. I noticed the appearance of the Ctrl+G help text, but hadn't taken the time to check out the changelog and see what else was introduced. I was too hyper focused on publishing the v0.1.0 and then rapid iteration of this: machjesusmoto \ mcp-toggle
Now I have to decide if I care to continue developing something that can never be want I want to be (and what we all actually want/need).
Another community workaround: lazy-mcp-preload
I've created a fork of voicetreelab/lazy-mcp that adds background server preloading to eliminate the first-call latency while maintaining the 95% token savings.
Repository
🔗 https://github.com/iamsamuelrodda/lazy-mcp-preload
The Problem with Existing Lazy Loading
While lazy-mcp achieves ~95% token reduction by exposing only 2 meta-tools instead of all tool schemas, it incurs ~500ms latency on the first tool call to each server (cold start).
The Solution: Background Preloading
Added a preloadAll config option that starts all MCP servers in parallel background goroutines immediately at proxy startup. By the time you need a tool, the servers are already warm.
{
"mcpProxy": {
"options": {
"lazyLoad": true,
"preloadAll": true
}
}
}
Results
| Metric | Direct MCP | lazy-mcp | lazy-mcp-preload |
|---|---|---|---|
| Startup tokens | ~15,000 | ~800 | ~800 |
| Context savings | 0% | 95% | 95% |
| First-call latency | 0ms | ~500ms | ~0ms |
| Tools visible | 30 | 2 | 2 |
How It Works
Claude Code session starts
│
▼
lazy-mcp-preload proxy starts
│
├──► Main thread: Ready with 2 meta-tools (~800 tokens)
│
└──► Background goroutines (parallel):
├─ Preload server 1
├─ Preload server 2
└─ Preload server 3
│
▼
All servers warm before first tool call
Installation
git clone https://github.com/iamsamuelrodda/lazy-mcp-preload
cd lazy-mcp-preload
make build
make generate-hierarchy
./scripts/deploy.sh
This is a workaround until native lazy loading support lands in Claude Code. Hope it helps others experiencing this issue!
This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.