xtuner
xtuner copied to clipboard
DPO dataset format and loss
Should be quite easy to add for someone who knows the codebase. The biggest problem might be a new dataset format.
Don't expect I need to link this but it's pretty nice implementation of the loss: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L817
Hi @samedii Thanks very much! Implementing DPO requires the corresponding Model, Data, and Trainer (Loss).
This is a complex task, and we have built a team from our community to implement the RLHF feature for xtuner!
Please stay tuned~