direct-preference-optimization
direct-preference-optimization copied to clipboard
Reproducing Win Rate inference for TL;DR
Hi, I have been trying to reproduce the win rate results from the paper for summarization and I'm struggling to get similar values. I wonder if you've experienced this as well? Could this perhaps be due to changes made to GPT-4 since the published results?
Thank you!
Hi @jdchang1, may I ask how you achieved summarization task? Just change the dataset to TL;DR?