unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Is this model only works for English texts. Long texts will be truncated to at most 512 tokens? [E5: Text Embeddings by Weakly-Supervised Contrastive Pre-training]

Open EasyLuck opened this issue 3 years ago • 3 comments

Hi, This issue is about e5:Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 I see the informations on hugginface. Limitations: This model only works for English texts. Long texts will be truncated to at most 512 tokens. How can I fit it to Chinese text (中文文本)? Train the model with Chinese text? if that , what the hardware configuration can do it. Thank you. Looking forward your reply.

EasyLuck avatar Apr 18 '23 11:04 EasyLuck

Yes, currently it only works for English.

We'll release multilingual versions of text embeddings in the coming month (no guarantee about the timeline though), please stay tuned!

Thanks, Liang

intfloat avatar Apr 20 '23 14:04 intfloat

So looking forward to the release of multilingual E5!! 👀

iamlockelightning avatar May 19 '23 07:05 iamlockelightning

Do you support Chinese now

scy-flower avatar Jul 10 '24 02:07 scy-flower