[Bug] Documents stuck in "waiting" status when RAG Pipeline is enabled (runtime_mode = 'rag_pipeline')
Self Checks
- [x] I have read the Contributing Guide and Language Policy.
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.9.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Steps
- Create a new knowledge base
- Enable Pipeline / Retrieval Settings for the knowledge base
- Upload any document to this knowledge base
- Document gets stuck in "Queuing" status forever
Root Cause File: /app/api/services/dataset_service.py Line: ~559 python
if dataset.runtime_mode != "rag_pipeline":
document_indexing_task.delay(dataset_id, [document.id])
When pipeline is enabled, runtime_mode is set to 'rag_pipeline', causing the condition to be False. The indexing task is never triggered.
Database Evidence All waiting documents have runtime_mode = 'rag_pipeline': sql
SELECT d.runtime_mode, doc.indexing_status, COUNT(*)
FROM datasets d
JOIN documents doc ON d.id = doc.dataset_id
WHERE doc.indexing_status = 'waiting'
GROUP BY d.runtime_mode, doc.indexing_status;
Result: All have runtime_mode = 'rag_pipeline' Successfully indexed documents have runtime_mode = 'general' or were uploaded before enabling pipeline.
✔️ Expected Behavior
Documents should be indexed automatically after upload, regardless of whether Pipeline is enabled or not.
❌ Actual Behavior
- Documents remain in "Queuing/Waiting" status forever
- No indexing task is sent to Celery worker (verified by checking Redis queues and worker logs)
- Documents cannot be used for retrieval
- No error message shown to user
Verification Worker is functioning correctly - when manually triggering the task, it processes documents successfully: python
document_indexing_task.apply_async(
args=[dataset_id, [document_id]],
queue='dataset'
)
Works perfectly - proves the issue is the task not being sent
Suggested Fix Remove the conditional check in line ~559: python
# Change from:
if dataset.runtime_mode != "rag_pipeline":
document_indexing_task.delay(dataset_id, [document.id])
# To:
document_indexing_task.delay(dataset_id, [document.id])
Or implement proper pipeline-specific indexing logic if different handling is intended.
Workaround Manually trigger indexing via Python script: python
from app_factory import create_app
app = create_app()
with app.app_context():
from tasks.document_indexing_task import document_indexing_task
from models.dataset import Document
from extensions.ext_database import db
waiting_docs = db.session.query(Document).filter(
Document.indexing_status == 'waiting'
).all()
for doc in waiting_docs:
document_indexing_task.apply_async(
args=[str(doc.dataset_id), [str(doc.id)]],
queue='dataset'
)
Impact
Severity: High - Pipeline feature completely unusable Affected: All users who enable Pipeline/Retrieval Settings Workaround: Requires database access
Hi @JackyX1996! I'm Dosu and I’m helping the dify team.
This bug is caused by a conditional check in the code that prevents document indexing tasks from being triggered when runtime_mode is set to 'rag_pipeline', leaving documents stuck in "waiting" status and unusable for retrieval. There is no alternative indexing logic for pipeline mode in the affected code section, so removing the check will allow standard indexing to trigger and resolve the issue, but be mindful of possible side effects if pipeline-specific logic is added elsewhere in the future. As a workaround, manually triggering the indexing task via Python works.
A recent merged pull request has addressed a related queuing issue by updating Celery worker queue configurations to include priority_pipeline, which should help resolve stuck tasks in newer versions. For persistent session errors in pipeline indexing, upgrading to Dify 2.0.0 beta or newer is recommended, as it includes fixes for session management and pipeline document processing. If you cannot upgrade, ensure you only pass primary key IDs between pipeline steps and always re-query ORM objects inside a new session context.
For more details, see:
- Conditional check root cause and code context
- Celery queue fix PR #27416
- Pipeline session management and upgrade guidance
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
It seems this issue has been resolved in the pull request at https://github.com/langgenius/dify/pull/27416/files#diff-c7b1c23e00a1fd0378f6ea41a7dfe1c9e043d766f738dba26860c783b82e8c6a
It seems this issue has been resolved in the pull request at https://github.com/langgenius/dify/pull/27416/files#diff-c7b1c23e00a1fd0378f6ea41a7dfe1c9e043d766f738dba26860c783b82e8c6a
What is the pattern for releasing docker images? I pulled an image and deployed a service today, but found that this PR hasn't been released to the docker image yet.