claude-code Having multiple MCP servers running Eats into Context Window

Environment

Platform (select one):
- [ ] Anthropic API
- [ ] AWS Bedrock
- [ ] Google Vertex AI
- [x] Other: Claude Code (local CLI with multiple MCP servers)
Claude CLI version:
Operating System: Ubuntu 22.04 (inside WSL)
Terminal: VS Code integrated terminal

Bug Description

When running multiple Claude MCP servers simultaneously (e.g., ~20 local MCP server processes), the context window in Claude Code rapidly depletes. Context usage starts between ~8%–18% but quickly consumes the entire available context after only ~5 prompts, making the session unusable. This occurs even if the prompts themselves are short.

Steps to Reproduce

Start ~20 local MCP server instances connected to Claude Code.
Open a Claude Code session in your terminal or VS Code.
Begin sending prompts as normal.
Observe the context percentage in Claude Code; note how it increases rapidly with each prompt, even if prompt size is small.
After ~5 prompts, context window reaches 100% consumed, forcing session reset.

Expected Behavior

The context window should remain stable or grow proportionally to the actual prompt/response size, not inflate excessively due to multiple MCP servers running. Running multiple MCP processes should not artificially bloat context usage.

Actual Behavior

Context window usage grows dramatically and unexpectedly when many MCP servers are active. This results in the session maxing out context capacity far sooner than expected, requiring frequent resets to continue using Claude Code effectively.

Additional Context

Behavior confirmed when running ~20 MCP servers locally.
Issue not observed with only a few MCP servers (<5) running concurrently.

Jul 05 '25 22:07 vanman2024

I've created a working proof-of-concept that addresses this exact issue:

🔗 Repository: https://github.com/machjesusmoto/claude-lazy-loading 📝 Full discussion: #7336

Results achieved:

95% token reduction (108k → 5k tokens)
Lightweight registry approach (~500 tokens)
Intelligent keyword-based loading
Working code you can test

While it requires Claude Code native support for true lazy loading, it demonstrates the solution and provides the blueprint for implementation.

The approach could reduce your MCP context consumption from eating into your window to just 2.5% overhead.

Sep 10 '25 19:09 machjesusmoto

It's worth noting that MCP Toggle functionality has been added in Claude Code 2.0.10:

2.0.10

Rewrote terminal renderer for buttery smooth UI

Enable/disable MCP servers by @mentioning, or in /mcp

Added tab completion for shell commands in bash mode

PreToolUse hooks can now modify tool inputs

Press Ctrl-G to edit your prompt in your system's configured text editor

Fixes for bash permission checks with environment variables in the command

Oct 10 '25 18:10 lukemmtt

@lukemmtt thanks for posting this. I noticed the appearance of the Ctrl+G help text, but hadn't taken the time to check out the changelog and see what else was introduced. I was too hyper focused on publishing the v0.1.0 and then rapid iteration of this: machjesusmoto \ mcp-toggle

Now I have to decide if I care to continue developing something that can never be want I want to be (and what we all actually want/need).

Oct 12 '25 07:10 machjesusmoto

Another community workaround: lazy-mcp-preload

I've created a fork of voicetreelab/lazy-mcp that adds background server preloading to eliminate the first-call latency while maintaining the 95% token savings.

Repository

🔗 https://github.com/iamsamuelrodda/lazy-mcp-preload

The Problem with Existing Lazy Loading

While lazy-mcp achieves ~95% token reduction by exposing only 2 meta-tools instead of all tool schemas, it incurs ~500ms latency on the first tool call to each server (cold start).

The Solution: Background Preloading

Added a preloadAll config option that starts all MCP servers in parallel background goroutines immediately at proxy startup. By the time you need a tool, the servers are already warm.

{
  "mcpProxy": {
    "options": {
      "lazyLoad": true,
      "preloadAll": true
    }
  }
}

Results

Metric	Direct MCP	lazy-mcp	lazy-mcp-preload
Startup tokens	~15,000	~800	~800
Context savings	0%	95%	95%
First-call latency	0ms	~500ms	~0ms
Tools visible	30	2	2

How It Works

Claude Code session starts
         │
         ▼
lazy-mcp-preload proxy starts
         │
         ├──► Main thread: Ready with 2 meta-tools (~800 tokens)
         │
         └──► Background goroutines (parallel):
                 ├─ Preload server 1
                 ├─ Preload server 2
                 └─ Preload server 3
                           │
                           ▼
              All servers warm before first tool call

Installation

git clone https://github.com/iamsamuelrodda/lazy-mcp-preload
cd lazy-mcp-preload
make build
make generate-hierarchy
./scripts/deploy.sh

This is a workaround until native lazy loading support lands in Claude Code. Hope it helps others experiencing this issue!

Nov 27 '25 02:11 IAMSamuelRodda

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

Dec 27 '25 10:12 github-actions[bot]