RLHF-Reward-Modeling icon indicating copy to clipboard operation
RLHF-Reward-Modeling copied to clipboard

How to finetune ARMO model with custom dataset?

Open Helen-Cheung opened this issue 1 year ago • 3 comments

How can I Fine-tuning the ARMO model with a custom dataset that only contains paired preference data without multi-objective reward scores?: )

Helen-Cheung avatar Jul 12 '24 03:07 Helen-Cheung

Was wondering about the same question!

nshen7 avatar Jul 12 '24 23:07 nshen7

@Haoxiang-Wang hi haoxiang, can you take a look into this?

WeiXiongUST avatar Jul 14 '24 03:07 WeiXiongUST

@Helen-Cheung @nshen7 I will push the code soon. Stay tuned!

Haoxiang-Wang avatar Jul 15 '24 05:07 Haoxiang-Wang

Training code released!

Haoxiang-Wang avatar Sep 18 '24 07:09 Haoxiang-Wang