Adding knowledge throws error (PDF)
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.6.3
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
Adding new knowledge throws error and quitting api server. I can't figure out if this is an error on my local machine as the same PDF can be upload to Dify Cloud. But it haven't happened until version 0.6.3
✔️ Expected Behavior
Upload PDF and embed it to vector db.the
❌ Actual Behavior
FO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:31] "POST /console/api/files/upload?source=datasets HTTP/1.1" 201 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/workspaces/current/models/model-types/rerank HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/workspaces/current/default-model?model_type=rerank HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/datasets/process-rule HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/datasets/indexing-estimate HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/datasets/process-rule HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "OPTIONS /console/api/datasets/indexing-estimate HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "GET /console/api/datasets/process-rule HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "GET /console/api/workspaces/current/default-model?model_type=rerank HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "GET /console/api/datasets/process-rule HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "GET /console/api/workspaces/current/model-providers/cohere/icon_small/en_US?_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiOGM0ZDRjMjUtNGYyMC00YWM0LTlkZDMtYWJlMzI3NTkyMmNjIiwiZXhwIjoxNzE1ODczMjgyLCJpc3MiOiJDTE9VRCIsInN1YiI6IkNvbnNvbGUgQVBJIFBhc3Nwb3J0In0.dZGDQXBOtu6yR5F2Z7h85KJbmVdf1KYCyqzNRC2GGmE HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [16/Apr/2024 21:35:38] "GET /console/api/workspaces/current/models/model-types/rerank HTTP/1.1" 200 - python(2929,0x2d8f07000) malloc: *** error for object 0x1272cdb08: pointer being freed was not allocated python(2929,0x2d8f07000) malloc: *** set a breakpoint in malloc_error_break to debug
`Unhandled Runtime Error TypeError: Failed to fetch
Source service/base.ts (299:17) @ fetch
297 | }), 298 | new Promise((resolve, reject) => {
299 | globalThis.fetch(urlWithPrefix, options as RequestInit) | ^ 300 | .then((res) => { 301 | const resClone = res.clone() 302 | // Error handler`
The errors you're encountering suggest a couple of potential issues. The malloc error indicates a memory management problem, possibly due to the size of the PDF you're uploading or inefficient memory usage in the code. Ensure your server has sufficient memory, especially for large PDF files, and consider increasing the memory limit if necessary.
The Failed to fetch error points to an issue with the HTTP request made during file upload. This could be because the file size exceeds the server's maximum file size limit. The default upload limit for many servers is quite low (e.g., 1MB for NGINX), so you might need to adjust the client_max_body_size directive in your server's configuration file to allow larger uploads. Remember to restart the server after making changes to ensure they take effect.
Additionally, Dify version 0.6.3 has a file size limit of 15MB for uploads. If your PDF exceeds this, reducing its size before uploading could resolve the issue. If problems persist after checking these areas, further details about the PDF and any additional error messages would be helpful for diagnosing the issue more accurately.
Sources
- 本地docker部署,数据集创建上传pdf文件失败
- api/controllers/service_api/dataset/document.py
- web/app/components/datasets/create/file-uploader/index.tsx
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
The error can be reproduced by having a knowledgebase with cohere reranking v3 as hybrid search. Then setting the embedding model to cohere ranking v2 in system model settings.
I'm not sure what problem you are currently encountering. From the description, it seems that your PDF failed to be parsed into the knowledge base. However, judging from your last reply, it seems that you encountered a search error using coher reranking v3. @AndyMik90 If it is a problem with PDF parsing, our cloud service uses the Unstructured tool. You need to configure Unstructured locally. For specific operations, please refer to our documentation.
I'm not sure what problem you are currently encountering. From the description, it seems that your PDF failed to be parsed into the knowledge base. However, judging from your last reply, it seems that you encountered a search error using coher reranking v3. @AndyMik90 If it is a problem with PDF parsing, our cloud service uses the Unstructured tool. You need to configure Unstructured locally. For specific operations, please refer to our documentation.
Hi JohnJyong, I have unstructured configured locally and have been working up until latest update. As I mentioned in the latest comment, it seems that the problem is happening on index-estimation where the reranking is triggered, and is a mismatch between v2 and v3 " The error can be reproduced by having a knowledgebase with cohere reranking v3 as hybrid search. Then setting the embedding model to cohere ranking v2 in system model settings."
@AndyMik90 is this model?
@AndyMik90 is this model?
Yes
Hi, @AndyMik90
I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue involving adding new knowledge to the Dify API server, resulting in an error and server shutdown. The error message indicates a "Failed to fetch" TypeError, and the issue started occurring in version 0.6.3. There were discussions around potential solutions addressing memory management and HTTP request issues. However, the current status of the issue is unresolved.
Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation.
Still an issue for me
W. Logan Clark
On Sat, May 18, 2024 at 9:03 AM dosubot[bot] @.***> wrote:
Hi, @AndyMik90 https://github.com/AndyMik90
I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue involving adding new knowledge to the Dify API server, resulting in an error and server shutdown. The error message indicates a "Failed to fetch" TypeError, and the issue started occurring in version 0.6.3. There were discussions around potential solutions addressing memory management and HTTP request issues. However, the current status of the issue is unresolved.
Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation.
— Reply to this email directly, view it on GitHub https://github.com/langgenius/dify/issues/3535#issuecomment-2118867000, or unsubscribe https://github.com/notifications/unsubscribe-auth/BH6TQ227CTQ327Y4XRE62BTZC53VXAVCNFSM6AAAAABGKEUIQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYHA3DOMBQGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
still a issue also for me. Cohere embed-multilingual-v3.0 cannot be used (I tried mostly with PDF documents and I have configured Milvus DB standalone). Selecting that model is causing an endless indexing process. Switching to OpenAI text-embedding-3-large is working as expected.
thanks loris luise
@AndyMik90 is this model?