haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Investigate support for images in `ToolCallResult`

Open anakin87 opened this issue 8 months ago • 2 comments

Currently, providers that expose a tool role for messages (OpenAI, Gemini, Ollama) do not support images in the tool result.

Conversely, Anthropic has no tool role and requires tool results to be included in a user message, which can contain images. Since multimodal support in Amazon Bedrock is mostly based on Anthropic, Bedrock supports this use case as well. For a Bedrock example, see https://github.com/deepset-ai/haystack-experimental/pull/307#issuecomment-2903865141.

Supporting this use case would require refactoring our ToolCallResult dataclass. If more model providers begin to allow this pattern, we should investigate and evaluate this refactoring.

This doesn't currently seem like a high priority, because a simple workaround is to create a user message with the image returned by the tool.

anakin87 avatar May 23 '25 10:05 anakin87

a simple workaround is to create a user message with the image returned by the tool

I am afraid that this might not be that simple, due to the fact that Anthropic utilizes special content types for tool results different from the regular user messages. Thus, we still need to track whether the message is from tool or from user.

Moreover, if my understanding is correct, Gemini models should accept any json-serializable format, not only string. (reference: https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1beta1/Content#FunctionResponse)

IMO, currently there is no clear consensus about the tool return format and types. So ideally we should be a little flexible, so that it is ultimately up to the model to decide how the tool result can be used. I do not have a clear plan at the moment though.

LastRemote avatar Jun 12 '25 09:06 LastRemote

This topic re-emerged when experimenting with MCP tools that can return images.

When multimodality lands on Haystack, we should find a convenient way to make the LLM understand these results. We might also explore the user message idea indicated above.

anakin87 avatar Jul 04 '25 16:07 anakin87