[Feature] Support making use of the native capabilities of models for structure output.
Search before asking
- [x] I searched in the issues and found nothing similar.
Description
Currently, when configured output_schema, the react agent directly prompt the model to use a specific format, and use an output parser to extract the structured response from the raw model output. This is the only way for models which don't support tool calling or json mode.
For models support tool calling or json mode, we can makes use of these capabilities to generate structured output.
See https://python.langchain.com/docs/how_to/structured_output/ for details.
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!
@wenjin272 Hello, I have some questions about this requirement. Question: How to handle the capability differences between different models?
Model Tool Calling JSON Mode Structured Output API
| 模型 | Tool Calling | JSON Mode | Structured Output API |
|---|---|---|---|
| OpenAI GPT-4 | ✅ | ✅ | ✅ (response_format) |
| Anthropic Claude | ✅ | ⚠️ (beta) | ❌ |
| Ollama (Qwen/Llama) | ✅ | ❓ | ❓ |
| Tongyi Qwen | ✅ | ❓ | ❓ |
Question:
Is it necessary to implement a separate adaptation for each ChatModel?
Capability detection mechanism (runtime vs. configuration time)?
Degradation strategy (JSON mode failure → tool calling → prompt)?
Besides that, there's another problem. Semantic issues of using Tool Calling for structured output
Question: Is it reasonable to use Tool Calling to implement structured output? Technically feasible: Define a virtual tool named "formatoutput"
{
"name": "format_output",
"description": "Format the final response",
"parameters": OutputData.model_json_schema()
}
However, there are problems:
-
Semantic ambiguity: The tool should "perform an action," not "format output"
-
User experience: This "virtual tool" will appear in the tools list
-
Conflict with real tools: React Agent already has tool invocation capabilities
You can refer to the structured output doc of langChain:https://docs.langchain.com/oss/python/langchain/structured-output.
Roughly speaking, the performance of structured output can be ranked as follows:
- Use the native structured output capabilities provided by the model provider, currently only OpenAI and Grok.
- Use the tool calling to achieve the same result.
- Use prompt and examples to instruct llm.
Flink-Agents should provide all the strategy.
In addition, these questions seem to be rhetorical ones posed by an AI programming assistant. I hold a positive attitude towards using AI for programming assistance, but developers should to ensure their own understanding of the problems and the quality of the generated code.
You can refer to the structured output doc of langChain:https://docs.langchain.com/oss/python/langchain/structured-output.
Roughly speaking, the performance of structured output can be ranked as follows:
- Use the native structured output capabilities provided by the model provider, currently only OpenAI and Grok.
- Use the tool calling to achieve the same result.
- Use prompt and examples to instruct llm.
Flink-Agents should provide all the strategy.
Of course, for a new project like yours, an AI programming assistant is a good development partner for understanding the project. For submitted pull requests, I will meet their requirements and ensure they pass testing.
Here are two documents we can refer to:
- https://github.com/langchain-ai/langchain/blob/v0.3/docs/docs/how_to/structured_output.ipynb
- https://github.com/langchain-ai/langgraph/blob/main/docs/docs/how-tos/react-agent-structured-output.ipynb