LMOps [LLM_RETRIEVER]Some questions about the papar of llm

Q1:The paper said BM25 Retriever is the initial model.Do you mean use the cross-encoder is used to tune BM25 retriever?

Q2:In Section 4.2 what's the function of s(x,y,xi,yi)?In Section 4.3 what's the function of Lcont?Is it same as Lreward in Section 4.2?

Q3:How to train the retriever?I don't understand the order of train pipline.Does the paper mean first use retriever to get the candidates ,then choose positive and negative candidates to train rewards model.After that,use Lcont and Ldistill to tune retriever? In my opinion,it seems that first train the reward model ,then train retreiver? @intfloat Can you explain to me ?

Oct 22 '24 10:10 zhouchang123

Hi @zhouchang123 ,

Q1: No, BM25 is unsupervised, it does not need any training.

Q2: s(x,y,xi,yi) is a real-valued score produced by the reward model (a BERT based encoder). Lcont is the InfoNCE contrastive loss from A Simple Framework for Contrastive Learning of Visual Representations.

Q3: It is an iterative process, we first use retriever to get candidates, then choose positive and negative to train reward model, after that reward model will be used to tune retriever again, and so on. At the start of training, we do not have any retriever or reward model, so we use the unsupervised BM25 as the initial retriever.

Oct 24 '24 02:10 intfloat

I know the initial retreiver is BM25.What's the meaning of initial BM25 retreiver don't need training and the reward model to tune retreiver? What's is the tuning really do here? Am I misunderstanding?

Oct 24 '24 04:10 zhouchang123

[LLM_RETRIEVER]Some questions about the papar of llm_retriever .