dify icon indicating copy to clipboard operation
dify copied to clipboard

When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data

Open lzzzzzzzzz opened this issue 1 year ago • 8 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data, butcan be found in the knowledge base retrieval test

知识库清空重传.yml.txt

image image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

lzzzzzzzzz avatar Sep 23 '24 07:09 lzzzzzzzzz

The issue of the knowledge base node being unable to query any data after repeatedly deleting and re-uploading files is likely due to the error code dataset_not_initialized. This error indicates that the knowledge base is still initializing or indexing, and thus cannot process new queries or operations until this process is complete.

Supporting evidence from the document:

| dataset_not_initialized | 400 | 知识库仍在初始化或索引中。请稍候  |

This suggests that if you delete and re-upload files too frequently, the system may not have enough time to complete the indexing process, leading to a state where the knowledge base is perpetually initializing or indexing, and thus unable to handle queries.

Additionally, the provided test cases in the test_client.py file show that there are deliberate pauses (time.sleep(1)) between operations such as creating, updating, and deleting documents. This indicates that allowing some time for the system to process each operation is crucial to avoid issues related to the knowledge base being in an uninitialized state. Therefore, incorporating delays between delete and re-upload operations might help mitigate this issue [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 23 '24 07:09 dosubot[bot]

pls run the single node (knowledge-retrieval-node), is the issue still exist ?

JohnJyong avatar Sep 23 '24 08:09 JohnJyong

请运行单个节点(知识搜索节点),问题仍然存在吗?

I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:

{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None

I'm not sure if these errors are related. https://github.com/langgenius/dify/issues/8768

PolarPeak avatar Sep 26 '24 06:09 PolarPeak

pls run the single node (knowledge-retrieval-node), is the issue still exist ?

yes, and no error show in docker log( dify-api )

lzzzzzzzzz avatar Sep 26 '24 06:09 lzzzzzzzzz

请运行单个节点(知识搜索节点),问题仍然存在吗?

I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:

{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None

I'm not sure if these errors are related. #8768

https://github.com/weaviate/weaviate/issues/5912

PolarPeak avatar Oct 08 '24 02:10 PolarPeak

请运行单个节点(知识搜索节点),问题仍然存在吗?

I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:

{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None

I'm not sure if these errors are related. #8768

weaviate/weaviate#5912

The temporary solution is to create a new knowledge base and modify your workflow

lzzzzzzzzz avatar Oct 08 '24 04:10 lzzzzzzzzz

pls run the single node (knowledge-retrieval-node), is the issue still exist ?

same problem occurs angin, dify-api logs show as follow, Is it enough to help you locate bugs

image

workflow as follow image

lzzzzzzzzz avatar Oct 08 '24 06:10 lzzzzzzzzz

Hi, @lzzzzzzzzz. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary

  • Bug in the Dify knowledge base API causing unresponsiveness after repeated file deletions and re-uploads.
  • Suggested cause: perpetual initialization or indexing state.
  • Testing with a single node confirmed the issue persists without Docker log errors.
  • Related API log warnings about disk usage and read-only errors noted.

Next Steps

  • Is this issue still relevant to the latest version of the Dify repository? If so, please comment to keep the discussion open.
  • Otherwise, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Nov 25 '24 16:11 dosubot[bot]