align-anything icon indicating copy to clipboard operation
align-anything copied to clipboard

[Question] 关于safe rlhf-v的cm和rm模型训练的脚本和代码

Open Tunanzzz opened this issue 7 months ago • 1 comments

Required prerequisites

Questions

我看到了safe rlhf-v的ppo训练脚本,但是他需要的rm和cm部分的训练脚本目前没有看到,请问可以给点指导嘛?

Tunanzzz avatar Jun 18 '25 13:06 Tunanzzz

Required prerequisites

  • [x] I have read the documentation https://align-anything.readthedocs.io.[x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)[x] Consider asking first in a Discussion.

Questions

我看到了safe rlhf-v的ppo训练脚本,但是他需要的rm和cm部分的训练脚本目前没有看到,请问可以给点指导嘛?

Hi @Tunanzzz , thanks for your interest in the project!

For the Reward Model (RM) and Cost Model (CM) training scripts, you can refer to the implementation in this pull request: #203.

Let us know if you have any further questions!

Reindulger avatar Jun 24 '25 02:06 Reindulger