if you fine tune Mistral or only use smart prompting ?
great code can you share , if you fine tune Mistral or only use smart prompting ?
per reference https://github.com/FlagOpen/FlagEmbedding/blob/master/Tutorials/1_Embedding/1.2.5_BGE_EN_ICL.ipynb BGE-EN-ICL use decoder only LLM, Mistral-7B
https://arxiv.org/pdf/2409.15700 there is contradiction
Surprisingly, the best results are obtained using the original, unmodified architecture. vs Training Detail. We fine-tune the Mistral-7B model using a contrastive loss and conduct the process over a single epoch. For efficient fine-tuning, we employ Low-Rank Adaptation (LoRA) (Hu et al., 2021), setting the LoRA rank to 64 and the LoRA alpha to 32, with a learning rate of 1e-4. For retrieval tasks, we use in-batch negatives, a strategy not adopted for other tasks. Each dataset incorporates 7 hard negatives
so , what actually was done ?