Is this model only works for English texts. Long texts will be truncated to at most 512 tokens? [E5: Text Embeddings by Weakly-Supervised Contrastive Pre-training]
Hi, This issue is about e5:Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 I see the informations on hugginface. Limitations: This model only works for English texts. Long texts will be truncated to at most 512 tokens. How can I fit it to Chinese text (中文文本)? Train the model with Chinese text? if that , what the hardware configuration can do it. Thank you. Looking forward your reply.
Yes, currently it only works for English.
We'll release multilingual versions of text embeddings in the coming month (no guarantee about the timeline though), please stay tuned!
Thanks, Liang
So looking forward to the release of multilingual E5!! 👀
Do you support Chinese now