llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Add reranking support

Open donguyen32 opened this issue 1 year ago • 10 comments

According to the https://github.com/ggerganov/llama.cpp/pull/9510, lllama-cpp supported for reranking model https://huggingface.co/BAAI/bge-reranker-v2-m3. Please provide support for this version.

donguyen32 avatar Oct 14 '24 07:10 donguyen32

@abetlen Sorry but do you have any plans to implement this?

donguyen32 avatar Oct 14 '24 07:10 donguyen32

Hi @donguyen32 I am thinking the same thing and have just submitted a PR to add the rank method to High-Level API. I don't know if it will be merged or not, but it would be helpful to know how to do the ranking using llama-cpp-python.

yutyan0119 avatar Nov 03 '24 05:11 yutyan0119

@yutyan0119 Arcoding from the original repo, I see that the format of the rerank task is [BOS]query[EOS][SEP]doc[EOS] https://github.com/ggerganov/llama.cpp/blob/9f409893519b4a6def46ef80cd6f5d05ac0fb157/examples/server/utils.hpp#L185-L196 your inputs are [f"{query}</s><s>{doc}" for doc in documents] Please check it

donguyen32 avatar Nov 04 '24 04:11 donguyen32

@donguyen32 Thanks for your comment! Actually, I was looking at examples/embedding/embedding.cpp, so I think there are some differences from the server implementation.

I have verified that the output is the same as the original implementation with the following command

./llama-embedding \
    -m models/bge-reranker-v2-m3/ggml-model-f16.gguf \
    -p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." \
    --pooling rank --embd-normalize -1 --verbose-prompt 

The same command also seems to be used for testing on CI. https://github.com/ggerganov/llama.cpp/blob/a9e8a9a0306a8093eef93b0022d9f45510490072/ci/run.sh#L755

In fact, I do not know how these symbols affect the accuracy of Rerank. If you know, please let me know.

And if we want a return value in the form of the server, I think it would be better to have a separate method in the form of a create_embedding method for embed, like create_rank.

yutyan0119 avatar Nov 05 '24 03:11 yutyan0119

I am wondering is this feature in plan?

thiner avatar Dec 06 '24 04:12 thiner

Hello @yutyan0119 @donguyen32 @abetlen

Could you please share any progress on this?

I think this is a super important PR!

KanishkNavale avatar Dec 13 '24 15:12 KanishkNavale

Bump, waiting for that too.

yazon avatar Mar 27 '25 13:03 yazon

This is critical for my use case as well.

handshape avatar May 30 '25 11:05 handshape

@KanishkNavale @yazon @handshape Please view PR https://github.com/abetlen/llama-cpp-python/pull/1820 I think it works well

donguyen32 avatar Jun 02 '25 01:06 donguyen32

I reviewed the PR now. However, I have been using that branch for re-ranking and it works fluently for me.

KanishkNavale avatar Jun 02 '25 09:06 KanishkNavale