How to gurantee the output.logits.shape[:-1] == labels.shape

Open foreverhell opened this issue 1 year ago • 0 comments

dpo How to gurantee the two the same? When I train a custom LLM in DPO, the loss cannot divergence. Is the reason for the two are different?

Aug 13 '24 07:08 foreverhell