MaxKB 引入其他开源PDF OCR项目用于扫描件PDF的识别

MaxKB 版本

v1.2.0

请描述您的需求或者改进建议

首先感谢开发者开源这么好的项目！有很多的PDF文档都是扫描件，MaxKB是无法正常识别的。

请描述你建议的实现方案

希望可以加入PDF的OCR功能，可以对PDF导入后先进行OCR识别：一般都是把PDF每一页转换为图片，然后进行识别。可以参考这个开源项目：https://github.com/hiroi-sora/Umi-OCR 他的OCR识别效果还是很好的

附加信息

No response

Jun 13 '24 06:06 HonorWater

感谢反馈，我们先调研一下。

Jun 13 '24 09:06 baixin513

我们暂时不提供OCR相关工具，你可以自行写一下相关的解析工具，在应用工作流中使用，后期知识库也会支持工作流。

Oct 23 '25 03:10 baixin513

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

We do not provide OCR related tools for the time being. You can write related parsing tools by yourself and use them in the application workflow. The knowledge base will also support the workflow in the future.

Oct 23 '25 03:10 shaohuzhang1