claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

Having multiple MCP servers running Eats into Context Window

Open vanman2024 opened this issue 7 months ago • 5 comments

Environment

  • Platform (select one):

    • [ ] Anthropic API
    • [ ] AWS Bedrock
    • [ ] Google Vertex AI
    • [x] Other: Claude Code (local CLI with multiple MCP servers)
  • Claude CLI version: <!-- Replace with actual output of claude --version -->

  • Operating System: Ubuntu 22.04 (inside WSL)

  • Terminal: VS Code integrated terminal

Bug Description

When running multiple Claude MCP servers simultaneously (e.g., ~20 local MCP server processes), the context window in Claude Code rapidly depletes. Context usage starts between ~8%–18% but quickly consumes the entire available context after only ~5 prompts, making the session unusable. This occurs even if the prompts themselves are short.

Steps to Reproduce

  1. Start ~20 local MCP server instances connected to Claude Code.
  2. Open a Claude Code session in your terminal or VS Code.
  3. Begin sending prompts as normal.
  4. Observe the context percentage in Claude Code; note how it increases rapidly with each prompt, even if prompt size is small.
  5. After ~5 prompts, context window reaches 100% consumed, forcing session reset.

Expected Behavior

The context window should remain stable or grow proportionally to the actual prompt/response size, not inflate excessively due to multiple MCP servers running. Running multiple MCP processes should not artificially bloat context usage.

Actual Behavior

Context window usage grows dramatically and unexpectedly when many MCP servers are active. This results in the session maxing out context capacity far sooner than expected, requiring frequent resets to continue using Claude Code effectively.

Additional Context

  • Behavior confirmed when running ~20 MCP servers locally.
  • Issue not observed with only a few MCP servers (<5) running concurrently.

vanman2024 avatar Jul 05 '25 22:07 vanman2024

I've created a working proof-of-concept that addresses this exact issue:

🔗 Repository: https://github.com/machjesusmoto/claude-lazy-loading 📝 Full discussion: #7336

Results achieved:

  • 95% token reduction (108k → 5k tokens)
  • Lightweight registry approach (~500 tokens)
  • Intelligent keyword-based loading
  • Working code you can test

While it requires Claude Code native support for true lazy loading, it demonstrates the solution and provides the blueprint for implementation.

The approach could reduce your MCP context consumption from eating into your window to just 2.5% overhead.

machjesusmoto avatar Sep 10 '25 19:09 machjesusmoto

It's worth noting that MCP Toggle functionality has been added in Claude Code 2.0.10:

2.0.10

  • Rewrote terminal renderer for buttery smooth UI
  • Enable/disable MCP servers by @mentioning, or in /mcp
  • Added tab completion for shell commands in bash mode
  • PreToolUse hooks can now modify tool inputs
  • Press Ctrl-G to edit your prompt in your system's configured text editor
  • Fixes for bash permission checks with environment variables in the command

lukemmtt avatar Oct 10 '25 18:10 lukemmtt

@lukemmtt thanks for posting this. I noticed the appearance of the Ctrl+G help text, but hadn't taken the time to check out the changelog and see what else was introduced. I was too hyper focused on publishing the v0.1.0 and then rapid iteration of this: machjesusmoto \ mcp-toggle

Now I have to decide if I care to continue developing something that can never be want I want to be (and what we all actually want/need).

machjesusmoto avatar Oct 12 '25 07:10 machjesusmoto

Another community workaround: lazy-mcp-preload

I've created a fork of voicetreelab/lazy-mcp that adds background server preloading to eliminate the first-call latency while maintaining the 95% token savings.

Repository

🔗 https://github.com/iamsamuelrodda/lazy-mcp-preload

The Problem with Existing Lazy Loading

While lazy-mcp achieves ~95% token reduction by exposing only 2 meta-tools instead of all tool schemas, it incurs ~500ms latency on the first tool call to each server (cold start).

The Solution: Background Preloading

Added a preloadAll config option that starts all MCP servers in parallel background goroutines immediately at proxy startup. By the time you need a tool, the servers are already warm.

{
  "mcpProxy": {
    "options": {
      "lazyLoad": true,
      "preloadAll": true
    }
  }
}

Results

Metric Direct MCP lazy-mcp lazy-mcp-preload
Startup tokens ~15,000 ~800 ~800
Context savings 0% 95% 95%
First-call latency 0ms ~500ms ~0ms
Tools visible 30 2 2

How It Works

Claude Code session starts
         │
         ▼
lazy-mcp-preload proxy starts
         │
         ├──► Main thread: Ready with 2 meta-tools (~800 tokens)
         │
         └──► Background goroutines (parallel):
                 ├─ Preload server 1
                 ├─ Preload server 2
                 └─ Preload server 3
                           │
                           ▼
              All servers warm before first tool call

Installation

git clone https://github.com/iamsamuelrodda/lazy-mcp-preload
cd lazy-mcp-preload
make build
make generate-hierarchy
./scripts/deploy.sh

This is a workaround until native lazy loading support lands in Claude Code. Hope it helps others experiencing this issue!

IAMSamuelRodda avatar Nov 27 '25 02:11 IAMSamuelRodda

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

github-actions[bot] avatar Dec 27 '25 10:12 github-actions[bot]