unilm Is this model only works for English texts. Long texts will be truncated to at most 512 tokens? [E5: Text Embeddings by Weakly-Supervised Contrastive Pre-training]

Hi, This issue is about e5:Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 I see the informations on hugginface. Limitations: This model only works for English texts. Long texts will be truncated to at most 512 tokens. How can I fit it to Chinese text (中文文本)？ Train the model with Chinese text? if that , what the hardware configuration can do it. Thank you. Looking forward your reply.

Apr 18 '23 11:04 EasyLuck

Yes, currently it only works for English.

We'll release multilingual versions of text embeddings in the coming month (no guarantee about the timeline though), please stay tuned!

Thanks, Liang

Apr 20 '23 14:04 intfloat

So looking forward to the release of multilingual E5!! 👀

May 19 '23 07:05 iamlockelightning

Do you support Chinese now

Jul 10 '24 02:07 scy-flower