dify icon indicating copy to clipboard operation
dify copied to clipboard

Hyperlinks are lost after xlsx parsing

Open ic-xu opened this issue 1 year ago • 1 comments

Self Checks

  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

The original text I had was like this.

image

But what I got was like this. image

2. Additional context or comments

I hope that the hyperlink will not be lost after parsing and will be presented in markdown.

3. Can you help us with this feature?

  • [X] I am interested in contributing to this feature.

ic-xu avatar Jul 23 '24 11:07 ic-xu

@ic-xu 🌟

To retain and present hyperlinks in markdown format after parsing an XLSX file, you need to ensure that the remove_hyperlinks flag is set to False when initializing the MarkdownExtractor. This will prevent the remove_hyperlinks method from stripping out the hyperlinks from the content.

Here is an example of how you can initialize the MarkdownExtractor to retain hyperlinks:

extractor = MarkdownExtractor(file_path="path_to_your_file.xlsx", remove_hyperlinks=False)
documents = extractor.extract()

By setting remove_hyperlinks=False, the hyperlinks in the markdown content will be preserved [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 23 '24 11:07 dosubot[bot]