RFC: add `Tool.outputSchema` and `CallToolResult.structuredContent`
Adds support for strict validation of structured tool results.
- A
Toolcan now optionally provide anoutputSchemaproperty, containing a JSON schema that defines the structure of its output. -
CallToolResultadds a newstructuredContentproperty, mutually exclusive withCallToolResult.contentproperty:- for Tools that do not declare an outputSchema,
result.structuredContentwill be absent, andresult.contentwill be returned as before. - for Tools that declare an outputSchema,
result.structuredContentwill contain a string whose contents must validate against the schema, andresult.contentwill not be absent.
- for Tools that do not declare an outputSchema,
Prototype for typescript SDK support in https://github.com/modelcontextprotocol/typescript-sdk/pull/454.
Design notes
This PR aims to provide simple, lightweight support for strict validation of tool result data whose structure can be entirely described by a single JSON schema. The approach here pairs a new Tool.outputSchema property with a new CallToolResult.structuredContent property, avoiding use of the CallToolResult.content array.
This approach leaves the path open for adding schematic validation support to the much richer and more complex space of tools that make use of the full expressiveness of the CallToolResult.content array, via an additional Tool property. Support for these use cases has been proposed in #356, and is under active discussion there.
(After exploring possible ways of providing integrated support for both kinds use cases with one set of protocol additions, it's clear that both will be better served by a disjoint approach: strict validation of statically typed data results can be accomplished with the simple additions provided here, and the subtleties arising from supporting full space of CallToolResult.content shapes - see e.g. #415, in addition to #356 - can be addressed more naturally absent the need to support the use cases addressed here.)
Motivation and Context
For tools that return structured output, having a description of that structure available is useful for various tasks, including:
- Validating the structure of tool results (and performing a more informed examination of the values they contain, post-validation). Especially useful when interacting with untrusted servers.
- Considering
outputSchemas (or their absence) when making decisions about which tools to expose to the model. - Transforming tool results before forwarding content to the model (e.g. formatting, projecting).
- Making tool results available as structured data in coding environments.
How Has This Been Tested?
No tests yet.
Breaking Changes
Optional new property that introduces a new behavior, not a breaking change.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [X] Documentation update
Checklist
- [X] I have read the MCP Documentation
- [X] My code follows the repository's style guidelines
- [ ] New and existing tests pass locally
- [ ] I have added appropriate error handling
- [X] I have added or updated documentation as needed
Additional context
A couple of comments on this as I prepare to undraft #223 for RFC:
- I've updated #223 to indicate that Servers that support structured output should advertise
generates: ["application/json"] - Is there any consideration for the Server returning a
TextResourceContentswith a mimeType ofapplication/json. I think this would be a more deliberate action by the Server in this scenario.
[update] The specific proposal would be to return a CallToolResult as follows:
{
"jsonrpc": "2.0",
"id": "abc123",
"result": {
"content": [
{
"type": "resource",
"resource": {
"uri": "file:///example/data.json",
"mimeType": "application/json",
"text": "{\"name\":\"John Doe\",..... and so on }}"
}
}
],
"isError": false
}
}
With guidance that Servers returning a Structured Response MUST return a CallToolResult containing one EmbeddedResource of type application/json.
Having outputSchema restrict you to to returning a single text content entry who's text validates to the schema feels oddly restrictive. That approach makes annotations largely pointless, and I can think of plenty of cases where one would want to have multiple content entries that would be possible with #356:
- Document processing:
- Have an
outputSchemathat is treated as definitions for multiple document types - Return content entries for:
-
ImageContent- Generated thumbnail for the document -
TextContent- Plain text contents of the document -
DataContent- Structured data, withschemareferencing one of the definitions inoutputSchema
-
- Have an
- Multi-entity type search
- Have an
outputSchemathat is treated as definitions for multiple entity types - Return multiple content entries with
schemarefs and annotations for relevance/importance
- Have an
Another drawback I see is the lack of ability to dynamically define the structure/schema for response content. There are certainly cases where the output schema may not be known ahead of time, but would still be useful for the client or LLM consuming the content; it also enriches the capability of sampling and prompt messages.
Lastly, going with this approach, extending its functionality in the future would likely represent a breaking chance since it largely goes against the implied design pattern of CallToolResult having an arbitrary number of content entries.
Having
outputSchemarestrict you to to returning a single text content entry who'stextvalidates to the schema feels oddly restrictive. That approach makes annotations largely pointless, and I can think of plenty of cases where one would want to have multiple content entries that would be possible with #356:
Document processing:
Have an
outputSchemathat is treated as definitions for multiple document typesReturn content entries for:
ImageContent- Generated thumbnail for the documentTextContent- Plain text contents of the documentDataContent- Structured data, withschemareferencing one of the definitions inoutputSchemaMulti-entity type search
- Have an
outputSchemathat is treated as definitions for multiple entity types- Return multiple content entries with
schemarefs and annotations for relevance/importanceAnother drawback I see is the lack of ability to dynamically define the structure/schema for response content. There are certainly cases where the output schema may not be known ahead of time, but would still be useful for the client or LLM consuming the content; it also enriches the capability of sampling and prompt messages.
Lastly, going with this approach, extending its functionality in the future would likely represent a breaking chance since it largely goes against the implied design pattern of
CallToolResulthaving an arbitrary number of content entries.
@lukaswelinder first of all, my apologies for putting up this PR without first participating in the discussion on #356 - I only saw it as I was writing the PR comments for this, but hadn't had a chance to look properly yet. Definitely didn't mean to step on your ongoing work.
I'll respond to your comment a bit later (not at keyboard right now) and also make comments on #356. Meanwhile I'll move this to draft, pending further discussion.
A couple of comments on this as I prepare to undraft #223 for RFC:
- I've updated RFC: Client / Server Content capabilities #223 to indicate that Servers that support structured output should advertise
generates: ["application/json"]- Is there any consideration for the Server returning a
TextResourceContentswith a mimeType ofapplication/json. I think this would be a more deliberate action by the Server in this scenario.[update] The specific proposal would be to return a CallToolResult as follows:
{ "jsonrpc": "2.0", "id": "abc123", "result": { "content": [ { "type": "resource", "resource": { "uri": "file:///example/data.json", "mimeType": "application/json", "text": "{\"name\":\"John Doe\",..... and so on }}" } } ], "isError": false } }With guidance that Servers returning a Structured Response MUST return a CallToolResult containing one EmbeddedResource of type
application/json.
@evalstate thanks for the heads up - will come back to this after we see what comes out of the discussion on #356 , per previous comment
@lukaswelinder first of all, my apologies for putting up this PR without first participating in the discussion on https://github.com/modelcontextprotocol/modelcontextprotocol/pull/356 - I only saw it as I was writing the PR comments for this, but hadn't had a chance to look properly yet. Definitely didn't mean to step on your ongoing work.
I'll respond to your comment a bit later (not at keyboard right now) and also make comments on https://github.com/modelcontextprotocol/modelcontextprotocol/pull/356. Meanwhile I'll move this to draft, pending further discussion.
@bhosmer-ant No offense taken, just glad to see there is motivation here. Input and feedback on #356 would be great.
LGTM, thanks for shepharding this!
It's still not clear to me how this is better than the additional 3-4 lines of code required to achieve this with the current spec here : https://gist.github.com/evalstate/e49cb163297c1ab940fb8a98e31947ed - the motivation and context isn't clear to me as a Host application developer.
I'm also not at all keen on the branching logic of the CallToolResults changing based on whether a field was present on the Tool description. Is that a necessary change?
At this point, if we want to do this shouldn't we just introduce a new type of tool instead (e.g. StructuredTool)?
It's still not clear to me how this is better than the additional 3-4 lines of code required to achieve this with the current spec here : https://gist.github.com/evalstate/e49cb163297c1ab940fb8a98e31947ed - the motivation and context isn't clear to me as a Host application developer.
Recapping the previous discussion, so apologies for brevity, but: it's better because it facilitates the structured data use case in a simple, direct way, without need for the extra ceremony and hops involved in routing the result through an EmbeddedResources.
I'm also not at all keen on the branching logic of the CallToolResults changing based on whether a field was present on the Tool description. Is that a necessary change?
Yeah, the triggering of validation based on the presence of an outputSchema in the tool definition is a key feature.
At this point, if we want to do this shouldn't we just introduce a new type of tool instead (e.g. StructuredTool)?
I think that would introduce fragmentation at the top level of the concept hierarchy that we don't want.
@ihrpr fyi new rev makes structuredContent an object, and updates the docs w/compatibility language (and a better example). (TS SDK example updated too)
OK - I'll just note my outstanding concerns on this one - not expecting a response - just adding my perspective as a Host application developer.
- MCP Server SDK. Introduction of return type polymorphism based on the presence of the Tool outputSchema will make the developer experience around tool definition and implementation more complex than necessary.
- MCP Client SDK. Return type polymorphism needs to be handled by the SDK along with additional validation, meaning changes will be needed for implementation, and requiring the Host integrator to special-case the new "structured" return type.
- Compatibility. Forward/Backward compatibility is managed by the MCP Server itself rather than handled by a stated convention within the SDK. This gives a large number of possibilities to integrate and test for - and opens challenges (potentially including security) when there is a content mismatch, as well as potentially doubling the length of returned content.
- Consistency. Currently Tools, Prompts and Resources have a logical consistency between their types. This adds a unique Tool-only condition that can't otherwise be represented within the MCP protocol (conceptual fragmentation).
- JSON Specific. The use of mime types for the schema and payload would allow the use of non-JSON structures if desired.
- Host Application Development. For someone building a generic Host application it's still not immediately obvious what the benefit is in receiving the data in structured form. Since both schema and content are supplied by the Server, the "interacting with untrusted servers" motivation isn't obviously improved here. Without an identifying uri or prior knowledge of the server/schema this is still "just JSON tokens". On this basis, this change brings extra effort to me as an integrator, with no clear benefit.