[Question] 关于safe rlhf-v的cm和rm模型训练的脚本和代码

Open Tunanzzz opened this issue 7 months ago • 1 comments

Required prerequisites

[x] I have read the documentation https://align-anything.readthedocs.io.
[x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[x] Consider asking first in a Discussion.

Questions

我看到了safe rlhf-v的ppo训练脚本，但是他需要的rm和cm部分的训练脚本目前没有看到，请问可以给点指导嘛？

Jun 18 '25 13:06 Tunanzzz

Required prerequisites

[x] I have read the documentation https://align-anything.readthedocs.io.[x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)[x] Consider asking first in a Discussion.

Questions

我看到了safe rlhf-v的ppo训练脚本，但是他需要的rm和cm部分的训练脚本目前没有看到，请问可以给点指导嘛？

Hi @Tunanzzz , thanks for your interest in the project!

For the Reward Model (RM) and Cost Model (CM) training scripts, you can refer to the implementation in this pull request: #203.

Let us know if you have any further questions!

Jun 24 '25 02:06 Reindulger