Yata

Results 13 comments of Yata

没太提value部分是怎么训练的

HI,I meet the same question about the extract_image_features.bin.Could you tell me how to solve this problem?

I want to know how do you solve the first problem,thanks

我也同问,想知道只是把choose和reject两种响应做对比吗,那么所有choose response及其增强都互为正例,然后正例和所有reject response及其增强都互为负例?

@Ablustrund 麻烦回答下?

@Ablustrund 谢谢你的回答。 这里是想问对比学习的一些细节。 在建模过程中,pairwise的数据为(x, good, bad), 然后我们对比学习的过程是将(x,good)拼接起来过两遍dropout得到的特征为正例,然后同batch中所有其他的(x,good)和(x,bad)得到的特征表达为负例吗 想知道一些这样具体建模的细节。另外就不太理解怎么直接对diff做对比学习。

The IPO loss means to minimize the distance between logits and 1/(2*beta), rather than minimize the logits. You can check the gradients of IPO loss and DPO loss.

> I have another possible explanation from math. > > We understand that πθ(yw)=πθ(yw1)πθ(yw2|yw1)⋯πθ(ywn|yw1⋯ywnw−1), where ywi denotes the i-th token of yw. > > Consequently, πθ(yw|x)=πθ(x|yw)πθ(yw)/πθ(x)=πθ(x|yw)πθ(yw1)πθ(yw2|yw1)⋯πθ(ywnw|yw1⋯ywnw−1)/πθ(x). > > Similarly, πθ(yl|x)=πθ(x|yl)πθ(yl)/πθ(x)=πθ(x|yl)πθ(yl1)πθ(yl2|yl1)⋯πθ(ylnl|yl1⋯ylnl−1)/πθ(x)....

> Hi all, Could somebody please explain to me the reason why `average_log_prob=False` make model to generate longer responses? Any hints/clarifications are appreciated. I've noticed that the model tends to...