MCP SDK hangs
Describe the bug
I have developed a local MCP server which performs retrieval to an ElasticSearch database using hybrid search. The MCP server works fine when adding it to Claude Desktop, or when running the MCP Inspection tool using mcp dev mcp_server.py. Here's how I've added it to Claude Desktop (I have replaced my UV directory, GCP credentials and project name):
{
"mcpServers": {
"filesystem": {
"command": "uv",
"args": [
"--directory",
"my-directory",
"run",
"mcp_server.py"
],
"env": {
"GOOGLE_APPLICATION_CREDENTIALS": "(...)/application_default_credentials.json",
"GOOGLE_CLOUD_PROJECT": "my-gcp-project"
}
}
}
}
However, when running the MCP server locally using the "stdio" protocol, the connection hangs. The MCP server performs its task, but then for some reason the results are not passed to the LLM API.
To Reproduce
I run the following script based on the docs:
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from google import genai
client = genai.Client(api_key=GEMINI_API_KEY)
# Create server parameters for stdio connection
server_params = StdioServerParameters(
command="/opt/homebrew/bin/uv", # Executable
args=["run", "mcp_server.py"], # MCP Server
env={
"GOOGLE_APPLICATION_CREDENTIALS": "(...)/application_default_credentials.json",
"GOOGLE_CLOUD_PROJECT": "my-gcp-project",
},
)
async def run():
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Prompt
prompt = "This is a test"
# Initialize the connection between client and server
await session.initialize()
# Send request to the model with MCP function declarations
response = await client.aio.models.generate_content(
model="gemini-2.0-flash",
contents=prompt,
config=genai.types.GenerateContentConfig(
temperature=0,
tools=[session], # uses the session, will automatically call the tool
# Uncomment if you **don't** want the sdk to automatically call the tool
# automatic_function_calling=genai.types.AutomaticFunctionCallingConfig(
# disable=True
# ),
),
)
print(response.text)
# Start the asyncio event loop and run the main function
asyncio.run(run())
This is the output:
(env) nielsrogge@Nielss-MacBook-Air env % uv run test.py
INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest
INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
INFO:app.implementations.retrievers.python_retriever:Search method: hybrid
INFO:app.implementations.retrievers.python_retriever:Initial query given to the retriever: test
INFO:elastic_transport.transport:POST <my-elastic-search-endpoint> [status:200 duration:0.316s]
INFO:app.implementations.vector_databases.elastic_search_vector_database:ES response time via 'took' (ms): 125
INFO:app.implementations.vector_databases.elastic_search_vector_database:ES response time (s): 0.4781339168548584
INFO:app.implementations.retrievers.python_retriever:Number of results: 30
=> after that, it hangs, the LLM does not return a result.
Expected behavior
I would expect a smooth response from the LLM, based on the tool call to the MCP server.
The sample code snippet from the docs works fine, but it does not for my MCP server (even though the MCP server works fine in Claude Desktop as well as within the MCP inspection tool).
Device:
- OS: Macbook Air, Apple M3 Chip, MacOS 14.7.2
- IDE: Cursor
- Python 3.12.9
Update: another colleague of mine had the same issue on their laptop. So this seems an issue related to the MCP Python SDK (or the Gemini integration).
Does using session.call_tool works, without any LLM inteaction?
No, I debugged this with Claude-4-Sonnet:
2. Testing direct tool call...
INFO:__main__:Testing direct tool call...
INFO:__main__:Starting MCP server using stdio transport
INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest
INFO:__main__:Available tools: ['retrieve_articles']
INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
INFO:__main__:Starting retrieval for query: test
INFO:__main__:Calling retriever.retrieve_articles...
INFO:app.implementations.retrievers.python_retriever:Search method: hybrid
INFO:app.implementations.retrievers.python_retriever:Initial query given to the retriever: test
INFO:elastic_transport.transport:POST search [status:200 duration:1.430s]
INFO:app.implementations.vector_databases.elastic_search_vector_database:ES response time via 'took' (ms): 1302
INFO:app.implementations.vector_databases.elastic_search_vector_database:ES response time (s): 1.5721080303192139
INFO:app.implementations.retrievers.python_retriever:Number of results: 30
INFO:__main__:Retrieved 5 articles
INFO:__main__:Returning result of length: 718 characters
It retrieves 5 articles as there's also reranking involved after hybrid search, but then hangs at this line:
result = await asyncio.wait_for(
session.call_tool("retrieve_articles", {"query": "test"}),
timeout=60.0
)
The MCP server itself is proprietary, it just returns articles from an ElasticSearch database as a long string. It works in Claude Desktop and with the MCP inspection tool, so I would assume there is something wrong with the "stdio" implementation.
Fixed, it turned out this had to do with an outdated version of MCP. Works fine on mcp==1.9.3.