KeyBERT icon indicating copy to clipboard operation
KeyBERT copied to clipboard

MMR Multiprocessing

Open vjp23 opened this issue 6 months ago • 1 comments

Hi, I'm loving KeyBERT and using it for a project now. However, I'm noticing that performance is very slow at scale when using MMR. I'm observing that running the embedding model on GPU speeds things up, but it seems that the bottleneck is now MMR computation on CPU. Does KeyBERT natively support multiprocessing that?

My plan was to break this all out- start by computing my own n-grams, then embedding the n-grams and documents directly, and passing the embeddings to KeyBERT in a multiprocessing setup (i.e. map a huge list of embeddings to multiple processes of KeyBERT to perform the MMR). But before I go down that road, I just want to double check that this is not already supported natively in KeyBERT?

vjp23 avatar Aug 08 '25 14:08 vjp23

That's currently not supported in KeyBERT. Just note that MMR is implemented currently as a sequential process where the entire input gets reranked. As such, implementing multiprocessing might become a bit tricky.

MaartenGr avatar Aug 17 '25 05:08 MaartenGr