dify icon indicating copy to clipboard operation
dify copied to clipboard

HTTP Request Node Not Retrieve File Extension from Content-Disposition Header

Open EcoleKeine opened this issue 1 year ago • 9 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. "http-request" download docx file
  2. "doc extractor" node

✔️ Expected Behavior

HTTP Request Node download the .docx file

❌ Actual Behavior

"doc extractor" node error: Unsupported Extension Type: .bin

Http request output:

{
  "status_code": 200,
  "body": "",
  "headers": {
    "server": "openresty",
    "date": "Fri, 10 Jan 2025 08:24:28 GMT",
    "content-type": "application/octet-stream; charset=UTF-8",
    "content-length": "19121",
    "connection": "keep-alive",
    "accept-ranges": "bytes",
    "etag": "64ba7c8713b64b689feeb4cc2b1f4954",
    "content-disposition": "attachment; filename=\"%e4******%95.docx\"; charset=UTF-8"
  },
  "files": [
    {
      "dify_model_identity": "__dify__file__",
      "id": null,
      "tenant_id": "3e581dc4-d40f-412b-b1de-dcc39fa288b4",
      "type": "custom",
      "transfer_method": "tool_file",
      "remote_url": null,
      "related_id": "1c9dd6a3-d0e2-4fd6-b899-222fc34eb366",
      "filename": "32b96deebef6444bad37bab894fb5d37.bin",
      "extension": ".bin",
      "mime_type": "application/octet-stream; charset=UTF-8",
      "size": 19121,
      "url": "/files/tools/1c9dd6a3-d0e2-4fd6-b899-222fc34eb366.bin?timestamp=1736497468&nonce=d2a5bb1c5141ab6b91ae877a8550d894&sign=5b7SpNwVHSfBruJ8zJc8AuGMAqOiCZfMRgWf0EnHuRg="
    }
  ]
}

"doc extractor" data handle :

{
  "documents": [
    {
      "dify_model_identity": "__dify__file__",
      "id": null,
      "tenant_id": "3e581dc4-d40f-412b-b1de-dcc39fa288b4",
      "type": "custom",
      "transfer_method": "tool_file",
      "remote_url": null,
      "related_id": "1c9dd6a3-d0e2-4fd6-b899-222fc34eb366",
      "filename": "32b96deebef6444bad37bab894fb5d37.bin",
      "extension": ".bin",
      "mime_type": "application/octet-stream; charset=UTF-8",
      "size": 19121,
      "url": "/files/tools/1c9dd6a3-d0e2-4fd6-b899-222fc34eb366.bin?timestamp=1736497468&nonce=e4c5844225ee8e0b11424aaaad307e1c&sign=ZkMjOv1-A9YpkQ3iBSn2Yu8ANZO9AbMarklwQh9IasU="
    }
  ]
}

EcoleKeine avatar Jan 10 '25 08:01 EcoleKeine

The "Unsupported Extension Type: .bin" error occurs because the "doc extractor" node does not support the .bin extension. The file downloaded by the "Http-request" node is being saved with a .bin extension due to its MIME type being application/octet-stream, which is a generic binary type.

To resolve this issue, you can modify the process to correctly identify the file type and extension. Here are some steps you can take:

  1. Check the Content-Disposition Header: The Content-Disposition header in your response includes a filename with a .docx extension. Use this filename to set the correct extension for the file.

  2. Modify the File Handling Logic: Ensure that the logic in the "Http-request" node or subsequent processing steps correctly assigns the .docx extension to the file based on the Content-Disposition header.

  3. Update the File Information: Before passing the file to the "doc extractor" node, update the file's metadata to reflect the correct extension and MIME type.

By ensuring the file is correctly identified as a .docx file, the "doc extractor" node should be able to process it without errors, as .docx is a supported extension [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Jan 10 '25 08:01 dosubot[bot]

I will work on this issue. Please assign, if it is possible

halogen22 avatar Jan 11 '25 08:01 halogen22

https://github.com/langgenius/dify/pull/12644#issuecomment-2585286631

kurokobo avatar Jan 11 '25 14:01 kurokobo

Hi, @EcoleKeine. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • The HTTP Request Node misidentifies .docx files as .bin due to MIME type application/octet-stream.
  • I suggested checking the Content-Disposition header and updating file metadata.
  • Halogen22 showed interest in resolving the issue and requested assignment.
  • Kurokobo referenced a related pull request.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
  • If no further updates are provided, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Feb 12 '25 16:02 dosubot[bot]

dosu, the pull request(#12653) to fix this problem is not merged

EcoleKeine avatar Feb 23 '25 08:02 EcoleKeine

This problem has not been fixed, has it?

fq393 avatar Mar 08 '25 05:03 fq393

I'm issuing the same behavior with PDF Files.

I'm running self-hosted V1.0.0 Dify

pierrejo avatar Mar 11 '25 10:03 pierrejo

I met the same problem with csv, I'm running self-hosted V1.0.0 Dify

sirizhou avatar Mar 12 '25 15:03 sirizhou

I met the same problem with excel, the return is .bin

kele-711 avatar Mar 13 '25 00:03 kele-711

I use Dify web, version 1.4.2, same issue with yaml { "status_code": 200, "body": "", "headers": { "accept-ranges": "bytes", "content-disposition": "attachment; filename=name.yaml", "content-length": "2254517", "content-type": "application/yaml", "date": "Tue, 01 Jul 2025 15:10:57 GMT", "etag": "\"54af2a6578400c9ef6b9d1ab6ce077ba\"", "last-modified": "Thu, 26 Jun 2025 08:15:46 GMT", "files": [ { "dify_model_identity": "__dify__file__", "id": null, "tenant_id": "b6d6745b-7467-4397-b6b9-c5b4c8aafaa9", "type": "custom", "transfer_method": "tool_file", "remote_url": null, "related_id": "273e93f2-dc72-43f1-b456-c98749c626be", "filename": "0cbfa19996a44f48b73973c5cf238333.bin", "extension": ".bin", "mime_type": "application/yaml", "size": 2254517, "url": "/files/tools/273e93f2-dc72-43f1-b456-c98749c626be.bin?timestamp=1751382658&nonce=a99757666687d1a407be2746dcf155c9&sign=aaa" } ] } I changed to application/x-yaml, text/x-yaml, text/yaml, but still no luck

prd-nguyet-le avatar Jul 01 '25 15:07 prd-nguyet-le

I met the same problem with xml, the return is .xsl (dify 1.6.0 ) @EcoleKeine Http-request.filepOutput:

{
  "documents": [
    {
      "dify_model_identity": "__dify__file__",
      "id": null,
      "tenant_id": "9024b8ec-023a-4beb-b01f-22db8b5ca2ec",
      "type": "custom",
      "transfer_method": "tool_file",
      "remote_url": null,
      "related_id": "cd210776-ced6-4198-94bb-878366477478",
      "filename": "d87748f0255544d59f580cb30c6565be.xsl",
      "extension": ".xsl",
      "mime_type": "application/xml",
      "size": 151271,
      "url": "https://upload.dify.ai/files/tools/cd210776-ced6-4198-94bb-878366477478.xsl?timestamp=1752916630&nonce=a27208ca6ff66142a29df25912dd4499&sign=Sc1Df_87xnv8F5zVOJAvpB6jzieuMQZHsj4hfvgSf7E="
    }
  ]
}

"doc extractor" data handle eooer : Unsupported Extension Type: .xsl

inpute:

{
  "ransomeRss": "https://www.ransomware.live/rss",
  "NewsList": null,
  "sys.files": [],
  "sys.user_id": "a0d6e238-0154-4df4-9d4a-731b5bc1b10d",
  "sys.app_id": "20a597f7-acf9-41ea-97b1-84066130f4a3",
  "sys.workflow_id": "fd95cd7d-a1bb-472a-b55c-85985417c4c9",
  "sys.workflow_run_id": "17ee5c37-cf2d-42f0-8c11-da469c9444f5"
}

ChenMartin avatar Jul 19 '25 09:07 ChenMartin

@ChenMartin 你应该@ 项目成员,最好是开新issue you should create new issue

EcoleKeine avatar Jul 21 '25 01:07 EcoleKeine