[Feature] The length of vectorized chunks supports customization

Open devnotperfect opened this issue 6 months ago • 2 comments

v2.0.1

目前默认向量化支持的是按照256字符前最完整的一句话进行截断后向量化，在bge-m3、qwen-embedding这种支持长token输入的向量模型下测试检索召回的效果不理想，是否可以支持在知识库或者文档向量化时自定义这个截断的长度限制

No response

No response

Jul 30 '25 06:07 devnotperfect

感谢反馈，后续可以考虑像大语言模型一样放参数设置中

Jul 31 '25 01:07 baixin513

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Thanks for the feedback, you can consider putting the parameter settings like the big language model in the future.

Jul 31 '25 01:07 shaohuzhang1