dify The LLM Node cannot read file with a custom field type of "File" when using a multimodal model.

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.13.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

1.Add a file field in the start node. 2.Add an LLM node and select a multimodal model, such as gpt-4o-mini. 3.Enable the vision mode and input the files field. 4.Run the test; the file output remains an empty array.

✔️ Expected Behavior

The output JSON should include the file content.

❌ Actual Behavior

In the output JSON format, the file is an empty array.

Dec 14 '24 06:12 kevintsai1202

@kevintsai1202 Hello there! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

The issue you're experiencing with the LLM Node not reading files with a custom field type of "File" in a multimodal model is due to the current limitations of the Dify platform. The LLM Node only supports recognizing image files and does not support reading document files directly. This means that when you input files, only image files are processed by the LLM Node, and other file types remain unprocessed, resulting in an empty array in the output JSON ^[1].

To address this, you can use a List Operation node to preprocess the array of file variables. This node can sort and filter the files, sending different file types to their corresponding processing nodes. For image files, ensure that the vision feature is enabled in the LLM Node to allow processing ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Dec 14 '24 06:12 dosubot[bot]

@kevintsai1202 Hello there! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

The issue you're experiencing with the LLM Node not reading files with a custom field type of "File" in a multimodal model is due to the current limitations of the Dify platform. The LLM Node only supports recognizing image files and does not support reading document files directly. This means that when you input files, only image files are processed by the LLM Node, and other file types remain unprocessed, resulting in an empty array in the output JSON [1].

To address this, you can use a List Operation node to preprocess the array of file variables. This node can sort and filter the files, sending different file types to their corresponding processing nodes. For image files, ensure that the vision feature is enabled in the LLM Node to allow processing [2].

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Using the List Operation node to filter still doesn't produce input or file output.

Dec 14 '24 07:12 kevintsai1202

how about directly put the file variable in the system prompts?

Dec 14 '24 15:12 hjlarry

how about directly put the file variable in the system prompts?

Adding a binary file to the prompt execution will result in an error.

Dec 14 '24 16:12 kevintsai1202

the image file will be converted, like: https://github.com/langgenius/dify/issues/11567#issuecomment-2537754156

Dec 15 '24 01:12 hjlarry

Hi, @kevintsai1202. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary

You reported a bug with the LLM Node failing to process files with a custom field type "File" using multimodal models.
I explained the limitation to image files and suggested using a List Operation node, which did not resolve the issue.
Hjlarry suggested using file variables in system prompts, which caused errors with binary files, and mentioned image file conversion.

Next Steps

Could you confirm if this issue is still relevant with the latest version of the Dify repository? If so, please comment to keep the discussion open.
If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

Jan 22 '25 16:01 dosubot[bot]

@dosubot

Hi, @kevintsai1202. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.嗨，我是 Dosu，我正在幫助 Dify 團隊管理他們的待辦事項。我將這個問題標記為過時。

Issue Summary 問題摘要

* You reported a bug with the LLM Node failing to process files with a custom field type "File" using multimodal models.您報告了一個關於LLM節點無法處理使用多模態模型的自定義字段類型“文件”的錯誤。

* I explained the limitation to image files and suggested using a List Operation node, which did not resolve the issue.我解釋了對圖像文件的限制，並建議使用列表操作節點，但這並未解決問題。

* Hjlarry suggested using file variables in system prompts, which caused errors with binary files, and mentioned image file conversion.Hjlarry 建議在系統提示中使用文件變量，這導致了二進制文件的錯誤，並提到了圖像文件的轉換。

Next Steps 下一步

* Could you confirm if this issue is still relevant with the latest version of the Dify repository? If so, please comment to keep the discussion open.您能確認這個問題在最新版本的 Dify 存儲庫中是否仍然相關嗎？如果是，請留言以保持討論開放。

* If there is no further activity, this issue will be automatically closed in 15 days.如果沒有進一步的活動，此問題將在 15 天後自動關閉。

Thank you for your understanding and contribution!感謝您的理解與貢獻！

Version 0.15.2 has been tested and is working properly. This issue can be closed.

Jan 23 '25 03:01 kevintsai1202

Thank you for closing the issue, kevintsai1202! We appreciate your help in keeping the repository organized.

Jan 23 '25 03:01 dosubot[bot]

Hello I ran into the same problem as you, dify version 0.15.2, the final output field is empty. Do you know what causes it?My input was an image and I wanted to directly use an LLM to extract the information and output the information in json format to fill in two tables.

Jul 25 '25 01:07 awslhh8