BGE-M3 不管是CPU还是GPU运行都很慢,这个怎么解决
下面代码运行很慢,一次向量化需要17秒,不管是CPU还是GPU都很慢,请问这个怎么解决
from multiprocessing import freeze_support
from FlagEmbedding import BGEM3FlagModel
import time
def main():
model = BGEM3FlagModel('BAAI/bge-m3',
use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
sentences_1 = ["What is BGE M3?", "Defination of BM25"]
sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
"BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
embeddings_1 = model.encode(sentences_1)['dense_vecs']
start_time = time.time()
embeddings_2 = model.encode(sentences_2)['dense_vecs']
end_time = time.time()
print("运行时间:", end_time - start_time)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
if __name__ == '__main__':
freeze_support()
main()
可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference
作者您好,请问BGE有相关的性能测试报告吗,大概每秒多少token 或 样本
下面代码运行很慢,一次向量化需要17秒,不管是CPU还是GPU都很慢,请问这个怎么解决
from multiprocessing import freeze_support from FlagEmbedding import BGEM3FlagModel import time def main(): model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation sentences_1 = ["What is BGE M3?", "Defination of BM25"] sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"] embeddings_1 = model.encode(sentences_1)['dense_vecs'] start_time = time.time() embeddings_2 = model.encode(sentences_2)['dense_vecs'] end_time = time.time() print("运行时间:", end_time - start_time) similarity = embeddings_1 @ embeddings_2.T print(similarity) if __name__ == '__main__': freeze_support() main()
这里的代码我们的测试结果是 (Linux, A800 80GB GPU, Intel(R) Xeon(R) Platinum 8358 CPU @2.60GHz):
-
BGEM3FlagModel('BAAI/bge-m3', use_fp16=True, device='cuda:0'): 0.27s -
BGEM3FlagModel('BAAI/bge-m3', use_fp16=False, device='cpu'): 0.42s
请注意在数据很少时多卡跑可能会影响速度。
可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference
有没有办法在tei推理的基础上,完成三路的集成?
我在mac上跑也是很慢,请问有什么好办法吗?
同问
可以使用一下加速方式,如https://github.com/huggingface/text-embeddings-inference
有没有办法在tei推理的基础上,完成三路的集成?