FlagEmbedding BGE-M3 不管是CPU还是GPU运行都很慢，这个怎么解决

下面代码运行很慢，一次向量化需要17秒，不管是CPU还是GPU都很慢，请问这个怎么解决

from multiprocessing import freeze_support
from FlagEmbedding import BGEM3FlagModel
import time

def main():
   model = BGEM3FlagModel('BAAI/bge-m3',
                          use_fp16=True)  # Setting use_fp16 to True speeds up computation with a slight performance degradation

   sentences_1 = ["What is BGE M3?", "Defination of BM25"]
   sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
                  "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
   embeddings_1 = model.encode(sentences_1)['dense_vecs']
   start_time = time.time()
   embeddings_2 = model.encode(sentences_2)['dense_vecs']
   end_time = time.time()


   print("运行时间：", end_time - start_time)
 
   similarity = embeddings_1 @ embeddings_2.T
   print(similarity)

if __name__ == '__main__':
   freeze_support()
   main()

Jan 30 '24 06:01 JavaTribe

可以使用一下加速方式，如https://github.com/huggingface/text-embeddings-inference

Jan 31 '24 02:01 staoxiao

作者您好，请问BGE有相关的性能测试报告吗，大概每秒多少token 或样本

Jan 31 '24 13:01 lyzltysgithub

下面代码运行很慢，一次向量化需要17秒，不管是CPU还是GPU都很慢，请问这个怎么解决

from multiprocessing import freeze_support
from FlagEmbedding import BGEM3FlagModel
import time

def main():
   model = BGEM3FlagModel('BAAI/bge-m3',
                          use_fp16=True)  # Setting use_fp16 to True speeds up computation with a slight performance degradation

   sentences_1 = ["What is BGE M3?", "Defination of BM25"]
   sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
                  "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
   embeddings_1 = model.encode(sentences_1)['dense_vecs']
   start_time = time.time()
   embeddings_2 = model.encode(sentences_2)['dense_vecs']
   end_time = time.time()


   print("运行时间：", end_time - start_time)
 
   similarity = embeddings_1 @ embeddings_2.T
   print(similarity)

if __name__ == '__main__':
   freeze_support()
   main()

这里的代码我们的测试结果是 (Linux, A800 80GB GPU, Intel(R) Xeon(R) Platinum 8358 CPU @2.60GHz)：

BGEM3FlagModel('BAAI/bge-m3', use_fp16=True, device='cuda:0'): 0.27s
BGEM3FlagModel('BAAI/bge-m3', use_fp16=False, device='cpu'): 0.42s

请注意在数据很少时多卡跑可能会影响速度。

Jan 31 '24 13:01 hanhainebula

可以使用一下加速方式，如https://github.com/huggingface/text-embeddings-inference

有没有办法在tei推理的基础上，完成三路的集成？

Feb 04 '24 09:02 trillionmonster

我在mac上跑也是很慢，请问有什么好办法吗？

May 29 '24 11:05 webyushao

同问

可以使用一下加速方式，如https://github.com/huggingface/text-embeddings-inference

有没有办法在tei推理的基础上，完成三路的集成？

Jun 19 '24 05:06 seetimee