When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.8.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data, butcan be found in the knowledge base retrieval test
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
The issue of the knowledge base node being unable to query any data after repeatedly deleting and re-uploading files is likely due to the error code dataset_not_initialized. This error indicates that the knowledge base is still initializing or indexing, and thus cannot process new queries or operations until this process is complete.
Supporting evidence from the document:
| dataset_not_initialized | 400 | 知识库仍在初始化或索引中。请稍候 |
This suggests that if you delete and re-upload files too frequently, the system may not have enough time to complete the indexing process, leading to a state where the knowledge base is perpetually initializing or indexing, and thus unable to handle queries.
Additionally, the provided test cases in the test_client.py file show that there are deliberate pauses (time.sleep(1)) between operations such as creating, updating, and deleting documents. This indicates that allowing some time for the system to process each operation is crucial to avoid issues related to the knowledge base being in an uninitialized state. Therefore, incorporating delays between delete and re-upload operations might help mitigate this issue [1][2].
To continue talking to Dosu, mention @dosu.
pls run the single node (knowledge-retrieval-node), is the issue still exist ?
请运行单个节点(知识搜索节点),问题仍然存在吗?
I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None
I'm not sure if these errors are related. https://github.com/langgenius/dify/issues/8768
pls run the single node (knowledge-retrieval-node), is the issue still exist ?
yes, and no error show in docker log( dify-api )
请运行单个节点(知识搜索节点),问题仍然存在吗?
I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"} {"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b [2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]} [2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558 [2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: NoneI'm not sure if these errors are related. #8768
https://github.com/weaviate/weaviate/issues/5912
请运行单个节点(知识搜索节点),问题仍然存在吗?
I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"} {"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b [2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]} [2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558 [2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: NoneI'm not sure if these errors are related. #8768
The temporary solution is to create a new knowledge base and modify your workflow
pls run the single node (knowledge-retrieval-node), is the issue still exist ?
same problem occurs angin, dify-api logs show as follow, Is it enough to help you locate bugs
workflow as follow
Hi, @lzzzzzzzzz. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary
- Bug in the Dify knowledge base API causing unresponsiveness after repeated file deletions and re-uploads.
- Suggested cause: perpetual initialization or indexing state.
- Testing with a single node confirmed the issue persists without Docker log errors.
- Related API log warnings about disk usage and read-only errors noted.
Next Steps
- Is this issue still relevant to the latest version of the Dify repository? If so, please comment to keep the discussion open.
- Otherwise, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!