RLHF-Reward-Modeling
RLHF-Reward-Modeling copied to clipboard
How to finetune ARMO model with custom dataset?
How can I Fine-tuning the ARMO model with a custom dataset that only contains paired preference data without multi-objective reward scores?: )
Was wondering about the same question!
@Haoxiang-Wang hi haoxiang, can you take a look into this?
@Helen-Cheung @nshen7 I will push the code soon. Stay tuned!
Training code released!