FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Fine-tune other models + pooling & vector normalization question

Open netapy opened this issue 2 years ago • 5 comments

Hi -

First of all thanks for this great code base, it's really helpful.

I've been trying to use these scripts for fine-tuning models other than BGE (e5-multilingual, I need a more multilingual model & tokenizer) but the performance doesn't seem to be that good on my basic sentenceTransformers eval script.

I suspected the default CLS reprensation to be at fault (I'm wondering why is this the default setting rather than mean pooling ?) but I'm not sure, my test don't produce that much difference.

Also I'm wondering if it may be linked to the normlize vector parameter ? It's not a default on other frameworks, also wondering why that seems to be the case here.

Thank you by advance for your insights !

Baudouin

netapy avatar Nov 17 '23 06:11 netapy

Thanks for your interest in our work! For e5, you should set --sentence_pooling_method mean due to the pooling method of e5 is mean pooling. And it also needs to be normalized, because the similarity score of e5 also is cosine.

staoxiao avatar Nov 17 '23 10:11 staoxiao

Great - thanks for the clear answer. Could you elaborate on why you guys chose the CLS token representation rather than mean pooling ? The latter always seems to produce better result in the papers I've read. Thanks !

netapy avatar Nov 20 '23 13:11 netapy

During pre-training, we use the cls pooling to represent the sentence, so we use the same pooling method in fine-tuning.

staoxiao avatar Nov 21 '23 09:11 staoxiao

During pre-training, we use the cls pooling to represent the sentence, so we use the same pooling method in fine-tuning.

Yep I got that — but why did you guys choose CLS pooling during pre-training ? I suppose it led to better accuracy ? I'd love to hear more about that ! I would have loved a research paper for BGE, it's really good ;)

netapy avatar Nov 21 '23 10:11 netapy

We just follow the previous settings and have not try to use mean pooling in pre-training.

staoxiao avatar Nov 22 '23 03:11 staoxiao