align-anything
align-anything copied to clipboard
[Question] 关于safe rlhf-v的cm和rm模型训练的脚本和代码
Required prerequisites
- [x] I have read the documentation https://align-anything.readthedocs.io.
- [x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [x] Consider asking first in a Discussion.
Questions
我看到了safe rlhf-v的ppo训练脚本,但是他需要的rm和cm部分的训练脚本目前没有看到,请问可以给点指导嘛?
Required prerequisites
- [x] I have read the documentation https://align-anything.readthedocs.io.[x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)[x] Consider asking first in a Discussion.
Questions
我看到了safe rlhf-v的ppo训练脚本,但是他需要的rm和cm部分的训练脚本目前没有看到,请问可以给点指导嘛?
Hi @Tunanzzz , thanks for your interest in the project!
For the Reward Model (RM) and Cost Model (CM) training scripts, you can refer to the implementation in this pull request: #203.
Let us know if you have any further questions!