Knowledge base search cannot accurately identify the number
例行检查
- [x] 我已确认目前没有类似 issue
- [x] 我已完整查看过项目 README,以及项目文档
- [x] 我使用了自己的 key,并确认我的 key 是可正常使用的
- [x] 我理解并愿意跟进此 issue,协助测试和提供反馈
- [x] 我理解并认可上述内容,并理解项目维护者精力有限,不遵循规则的 issue 可能会被无视或直接关闭
你的版本
- [ ] 公有云版本
- [x] 私有部署版本
问题描述 根据编号在数据库检索,不管是语义检索还是全文检索都无法搜索到相关的编号,如果未来需要商业化部署常见场景中,这个问题都会导致机器人答非所问。 复现步骤 1.导入带有编号/型号数据的知识库 2.已经尝试通过Q&A训练和文档上传知识库,保证编号的关键词已经被知识库多次覆盖。 3.通过语义检索和全文检索均无法准确指向对于的知识。
预期结果 相关编号/型号的段落被检索并输出至AI进行问题回答。 建议专门为编号类型加一个判断功能和检索模式。
相关截图
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Routine inspection
- [x] I have confirmed that there is currently no similar issue
- [x] I have fully reviewed the project README, as well as project documentation
- [x] I used my own key and confirmed that my key can be used normally
- [x] I understand and am willing to follow up on this issue, assist in testing and provide feedback
- [x] I understand and acknowledge the above content, and understand that project maintainers have limited energy. Issues that do not follow the rules may be ignored or closed directly
your version
- [ ] Public cloud version
- [x] Private deployment version
Problem Description When searching the database based on the number, neither semantic retrieval nor full-text retrieval can search for the relevant number. If commercial deployment is required in common scenarios in the future, this problem will cause the robot to answer questions incorrectly. Steps to reproduce
- Import the knowledge base with serial number/model data
- We have tried Q&A training and document uploading to the knowledge base to ensure that the numbered keywords have been covered by the knowledge base multiple times.
- Neither semantic retrieval nor full-text retrieval can accurately point to the relevant knowledge.
expected outcome The relevant number/model paragraphs are retrieved and output to AI for question answering. It is recommended to add a judgment function and search mode specifically for the number type.
Related screenshots
混合检索+重排就行
mark下,BM25分词不是很好,后续看看有没有好的分词方法。
mark下,BM25分词不是很好,后续看看有没有好的分词方法。
建议在知识库匹配的选项里增加传统的精准匹配,用于型号、编号问答场景,让用户设置编号的正则表达或者用提取功能。