panml
panml copied to clipboard
Idea: look into DPO for model tuning
This is about using direct preference optimisation:
https://arxiv.org/abs/2305.18290