DPO dataset format and loss

Open samedii opened this issue 2 years ago • 1 comments

Should be quite easy to add for someone who knows the codebase. The biggest problem might be a new dataset format.

Don't expect I need to link this but it's pretty nice implementation of the loss: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L817

Jan 20 '24 14:01 samedii

Hi @samedii Thanks very much! Implementing DPO requires the corresponding Model, Data, and Trainer (Loss).

This is a complex task, and we have built a team from our community to implement the RLHF feature for xtuner!

Please stay tuned~

Jan 22 '24 02:01 LZHgrla