Dify Knowledge => Chunk Settings => Delimiter not work as expected
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.3.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
step1. create knowledge and upload a file step2. In Chunk Settings, input some Delimiter, e.g. "\n\n,\n"
✔️ Expected Behavior
It will split by "\n\n" then split by "\n"
❌ Actual Behavior
It split by "\n\n,\n"
Below is the explaination for UI, """ A delimiter is the character used to separatetext. \n\n and \n are commonly used delimiters for separating paragraphs and lines. Combined with commas (n\n,\n) paragraphs will be segmented by lines when exceeding the maximum chunk length. You can also use special delimiters defined by yourself (e.g. ***)." """
Below is explaination from document,
""" Chunk identifier, The default value is \n, which means the text will be chunked by paragraphs. You can customize chunking rules using regex. The system will automatically execute chunking whenever it detects the specified delimiter. For example, means chunk the text by sentences. """
There is not proper examples for us to know it better, very confusing.
I found a similar unsolved discussion regarding the delimiter feature in Dify: Knowledge chunking delimiter, which was last updated on April 16, 2025 [1].
Regarding your issue, the expected behavior according to the Dify documentation is that delimiters like \n\n and \n should be used to split text into chunks. The delimiter \n\n is recommended for dividing the original document into large parent chunks, and \n for splitting parent chunks into smaller child chunks. Users can also define their own special delimiters [2][3].
However, it seems that the actual behavior you are experiencing is that the text is being split by the combined delimiter \n\n,\n, which is not the expected behavior. This might indicate a bug in the delimiter parsing logic.
Unfortunately, I couldn't find specific information on how the delimiter parsing logic is implemented in the codebase or any test cases related to this functionality. It might be helpful to check if there are any updates or patches available for your version, or to explore the codebase for any recent changes related to delimiter handling.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other