upload document embedding error & After deleting the failed document, the knowledge base will still retain the content of the failed document
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.6.11
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
1.when i upload a pdf, web show error
then i saw ollama log
and i review pdf, i seems like There are multiple line breaks on the last page of the document
- When I delete the PDF that failed to upload, I can still retrieve records in the knowledge base(At least one other file in the knowledge base,otherwise no records will be queried)
✔️ Expected Behavior
1.upload pdf success 2.If the PDF upload fails, the records in the library can be deleted completely
❌ Actual Behavior
- upload pdf error
- delete upload records, there are still residual records in the knowledge base
To address the issues you're encountering with document upload and deletion in your self-hosted Dify version 0.6.11, follow these steps:
-
500 Server Error on PDF Upload:
- Ensure that the PDF file size does not exceed the 15MB limit.
- Verify that the ETL configuration is correctly set up. For the community version, Dify ETL is used by default, but you can enable Unstructured ETL through environment variables if needed. Check the environment variables configuration [1].
-
Incomplete Deletion of Failed Uploads:
- When deleting a failed upload, ensure that all related records are removed from the knowledge base. This might involve manually cleaning up the database entries if the automatic deletion process fails.
Here is a sample code snippet to manually clean up the database entries for a failed upload:
# Example: Python script to clean up failed upload records from the database
import sqlite3
def delete_failed_uploads(db_path, document_id):
try:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Delete from the main document table
cursor.execute("DELETE FROM documents WHERE id = ?", (document_id,))
# Delete related entries from other tables
cursor.execute("DELETE FROM document_chunks WHERE document_id = ?", (document_id,))
cursor.execute("DELETE FROM document_metadata WHERE document_id = ?", (document_id,))
conn.commit()
print(f"Successfully deleted records for document ID {document_id}")
except sqlite3.Error as e:
print(f"Error deleting records: {e}")
finally:
if conn:
conn.close()
# Usage
db_path = 'path_to_your_database.db'
document_id = 'failed_document_id'
delete_failed_uploads(db_path, document_id)
Make sure to replace path_to_your_database.db and failed_document_id with the actual path to your database and the ID of the failed document upload. This script will help ensure that all related records are completely removed from the knowledge base [2].
Additionally, you can manage document segments and metadata through the Dify interface:
- Viewing and Editing Text Segments: You can view, add, and edit text segments in the segment list. This includes checking segment quality to ensure there are no overly short or long segments that could affect the Q&A performance.
- Metadata Management: Manage metadata information such as titles, URLs, keywords, and descriptions, which are used in the segment recall process.
- Document Disable and Archive: You can disable or archive documents that are temporarily not needed or no longer in use. Disabled documents will not be indexed, and archived documents can only be viewed or deleted, not edited [2].
For more detailed instructions, refer to the Knowledge Base and Document Maintenance documentation [2].
To continue talking to Dosu, mention @dosu.
@dosu here is the log in docker-api, Task tasks.clean_document_task.clean_document_task report exeception when delete document
and there occur another issue, i can't delete knowledge base (No associated applications)
To address the issues you're encountering with document deletion and knowledge base management in your self-hosted Dify environment, follow these steps:
1. 500 Server Error During Document Deletion
The 500 Server Error from the llama model provider during a text embedding request indicates a server-side issue. This error is mapped to the InvokeServerUnavailableError and is associated with requests.exceptions.ConnectionError and requests.exceptions.HTTPError [1].
Steps to Resolve:
- Check Server Logs: Review the server logs for more detailed error messages and stack traces.
-
Verify Model Provider Configuration: Ensure that the
llamamodel provider is correctly configured and accessible. - Restart Services: Restart the relevant services to clear any transient issues.
2. 404 Not Found Error When Deleting Objects
The 404 Not Found error when attempting to delete objects can be due to several reasons, such as incorrect endpoint URLs, invalid authorization tokens, or non-existent records.
Best Practices:
- Verify Endpoint URL: Ensure the endpoint URL is correct.
- Check Authorization: Confirm that the authorization token is valid and has the necessary permissions.
- Validate Record IDs: Ensure that the record IDs you are trying to delete exist in the database [2][3].
3. Unable to Delete Knowledge Base
The error message "The dataset is being used by some apps. Please remove the dataset from the apps before deleting it" indicates that the dataset is still linked to some applications.
Steps to Resolve:
- Unlink Dataset from Applications: Navigate to the applications using the dataset and remove the dataset from those applications.
- Check for Residual Links: Ensure there are no residual links or references to the dataset in the applications.
Example Code for Manual Cleanup
If automatic deletion fails, you can manually clean up the database entries for a failed upload:
import sqlite3
def delete_failed_uploads(db_path, document_id):
try:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Delete from the main document table
cursor.execute("DELETE FROM documents WHERE id = ?", (document_id,))
# Delete related entries from other tables
cursor.execute("DELETE FROM document_chunks WHERE document_id = ?", (document_id,))
cursor.execute("DELETE FROM document_metadata WHERE document_id = ?", (document_id,))
conn.commit()
print(f"Successfully deleted records for document ID {document_id}")
except sqlite3.Error as e:
print(f"Error deleting records: {e}")
finally:
if conn:
conn.close()
# Usage
db_path = 'path_to_your_database.db'
document_id = 'failed_document_id'
delete_failed_uploads(db_path, document_id)
Replace path_to_your_database.db and failed_document_id with the actual path to your database and the ID of the failed document upload.
By following these steps, you should be able to resolve the issues with document deletion and knowledge base management in your Dify environment.
To continue talking to Dosu, mention @dosu.