Parsing markdown files can result in the loss of some code blocks
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.6.9
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
- import a.md above
- auto split or custom split
then you can look the bug,somecode in Paragraphs 001 lose some code,but the Paragraphs 002 is unbroken
✔️ Expected Behavior
no code lose
❌ Actual Behavior
some code block was lost
@ideagay 你好! 我是一个可以帮助你解决问题、回答疑问,并成为贡献者的机器人。欢迎来到这里,我会在等待人类维护者的过程中为你提供帮助。🤖
To prevent the loss of code blocks when parsing markdown files in Dify version 0.6.9, you can use the react-markdown library along with react-syntax-highlighter for rendering code blocks. These dependencies are already included in your package.json file.
Here is an example of how you can set this up:
-
Install the necessary packages if they are not already installed:
npm install react-markdown react-syntax-highlighter -
Use the following code to render markdown content with syntax highlighting for code blocks:
import React from 'react'; import ReactMarkdown from 'react-markdown'; import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter'; import { dark } from 'react-syntax-highlighter/dist/esm/styles/prism'; const MarkdownRenderer = ({ content }) => { return ( <ReactMarkdown children={content} components={{ code({ node, inline, className, children, ...props }) { const match = /language-(\w+)/.exec(className || ''); return !inline && match ? ( <SyntaxHighlighter style={dark} language={match[1]} PreTag="div" {...props} > {String(children).replace(/\n$/, '')} </SyntaxHighlighter> ) : ( <code className={className} {...props}> {children} </code> ); } }} /> ); }; export default MarkdownRenderer;
This setup ensures that code blocks in your markdown files are properly parsed and rendered with syntax highlighting, preventing any loss of code blocks [1].
Additionally, you can adjust specific configurations in the Dify platform to improve markdown parsing and prevent code block loss. These configurations are part of the segmentation and cleaning settings in custom mode:
Segmentation rules:
- Segmentation identifier: Set an identifier such as "\n", and the system will segment the text when the identifier appears in the text.
- Maximum segment length: Segment based on the maximum character limit of the text, forcibly segmenting when exceeding this length.
- Segment overlap length: Set the number of overlapping characters between segments, recommended to be 10-25% of the segment length, which helps retain semantic relevance between segments and improves recall results during multi-segment recall.
Preprocessing rules:
- Replace continuous spaces, newlines, and tabs.
- Delete all URLs and email addresses.
By configuring these settings, you can ensure that the markdown content, including code blocks, is properly segmented and cleaned, thus preventing code block loss [2].
To continue talking to Dosu, mention @dosu.
html的属性不能放一行,要换行
<mbuton class="xxx"></mbutton>
->
<mbutton
class="xx">
</mbutton>