FlagEmbedding BGE-M3的预训练问题——loss产生偶尔上升的情况

请问这种loss产生偶尔上升的情况是否正常，又该如何判断预训练合适结束？

bge-m3-patent-retromae_batch56_max350.log

May 18 '24 07:05 LLLiHaotian

@LLLiHaotian , you need to fine-tune the model on your downstream data, and select the best pretrain ckpt based on the downstream performance.

May 19 '24 08:05 staoxiao

I only hope to use the encoder part to support representation, and there is no need for downstream tasks for the time being. Therefore, I would like to know how to determine which ckpt has the best effect. Looking forward to your answers.

May 19 '24 08:05 LLLiHaotian

There is no appropriate metric to evaluate the performance of pre-training task. We recommend selecting the ckpt based on the performance of fine-tuning downstream task.

May 19 '24 23:05 staoxiao

There is no appropriate metric to evaluate the performance of pre-training task. We recommend selecting the ckpt based on the performance of fine-tuning downstream task.

After pretraining on my task-specific training dataset, what type of data should I use for fine-tuning on the downstream retrieval task? I'm unsure whether to utilize my own downstream data(similar sentence, not too many) for fine-tuning, or to combine a large mount of public STS/retrieval dataset with my own downstream data?

Sep 27 '24 09:09 friendshipity