Gryff1ndor

Results 1 issues of Gryff1ndor

In your formula (the image below), it seems that the log[π(y|x)] was calculate through .sum(-1) after logits.softmax(-1), then .log(). ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/125982410/4c2a025c-b30e-40cf-9241-2d1a4c4db858) But in your codes (the image below), the log[π(y|x)] was...