FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

BGE-M3 不管是CPU还是GPU运行都很慢,这个怎么解决

Open JavaTribe opened this issue 1 year ago • 4 comments

下面代码运行很慢,一次向量化需要17秒,不管是CPU还是GPU都很慢,请问这个怎么解决

from multiprocessing import freeze_support
from FlagEmbedding import BGEM3FlagModel
import time

def main():
   model = BGEM3FlagModel('BAAI/bge-m3',
                          use_fp16=True)  # Setting use_fp16 to True speeds up computation with a slight performance degradation

   sentences_1 = ["What is BGE M3?", "Defination of BM25"]
   sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
                  "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
   embeddings_1 = model.encode(sentences_1)['dense_vecs']
   start_time = time.time()
   embeddings_2 = model.encode(sentences_2)['dense_vecs']
   end_time = time.time()


   print("运行时间:", end_time - start_time)
 
   similarity = embeddings_1 @ embeddings_2.T
   print(similarity)

if __name__ == '__main__':
   freeze_support()
   main()

JavaTribe avatar Jan 30 '24 06:01 JavaTribe

可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference

staoxiao avatar Jan 31 '24 02:01 staoxiao

作者您好,请问BGE有相关的性能测试报告吗,大概每秒多少token 或 样本

lyzltysgithub avatar Jan 31 '24 13:01 lyzltysgithub

下面代码运行很慢,一次向量化需要17秒,不管是CPU还是GPU都很慢,请问这个怎么解决

from multiprocessing import freeze_support
from FlagEmbedding import BGEM3FlagModel
import time

def main():
   model = BGEM3FlagModel('BAAI/bge-m3',
                          use_fp16=True)  # Setting use_fp16 to True speeds up computation with a slight performance degradation

   sentences_1 = ["What is BGE M3?", "Defination of BM25"]
   sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
                  "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
   embeddings_1 = model.encode(sentences_1)['dense_vecs']
   start_time = time.time()
   embeddings_2 = model.encode(sentences_2)['dense_vecs']
   end_time = time.time()


   print("运行时间:", end_time - start_time)
 
   similarity = embeddings_1 @ embeddings_2.T
   print(similarity)

if __name__ == '__main__':
   freeze_support()
   main()

这里的代码我们的测试结果是 (Linux, A800 80GB GPU, Intel(R) Xeon(R) Platinum 8358 CPU @2.60GHz):

  • BGEM3FlagModel('BAAI/bge-m3', use_fp16=True, device='cuda:0'): 0.27s
  • BGEM3FlagModel('BAAI/bge-m3', use_fp16=False, device='cpu'): 0.42s

请注意在数据很少时多卡跑可能会影响速度。

hanhainebula avatar Jan 31 '24 13:01 hanhainebula

可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference

有没有办法在tei推理的基础上,完成三路的集成?

trillionmonster avatar Feb 04 '24 09:02 trillionmonster

我在mac上跑也是很慢,请问有什么好办法吗?

webyushao avatar May 29 '24 11:05 webyushao

同问

可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference

有没有办法在tei推理的基础上,完成三路的集成?

seetimee avatar Jun 19 '24 05:06 seetimee