MaxKB icon indicating copy to clipboard operation
MaxKB copied to clipboard

[Feature] The length of vectorized chunks supports customization

Open devnotperfect opened this issue 6 months ago • 2 comments

MaxKB Version

v2.0.1

Please describe your needs or suggestions for improvements

目前默认向量化支持的是按照256字符前最完整的一句话进行截断后向量化,在bge-m3、qwen-embedding这种支持长token输入的向量模型下测试检索召回的效果不理想,是否可以支持在知识库或者文档向量化时自定义这个截断的长度限制

Please describe the solution you suggest

No response

Additional Information

No response

devnotperfect avatar Jul 30 '25 06:07 devnotperfect

感谢反馈,后续可以考虑像大语言模型一样放参数设置中

baixin513 avatar Jul 31 '25 01:07 baixin513

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Thanks for the feedback, you can consider putting the parameter settings like the big language model in the future.

shaohuzhang1 avatar Jul 31 '25 01:07 shaohuzhang1