LMOps
LMOps copied to clipboard
[dkpd] what's the motivation?
I've seen the dkpd paper, the experiment results show dkpd works, but I don't really see why implement dpo to KD in the first place, and how it should improve the traditional kld or reverse kld method. Can you explain that to me?