Siyuan Zhu comments

Repositories
Issues
Comments

Results 5 comments of


                                            Siyuan Zhu

Update README.md

lol

Support Generative Reward Model (GenRM)

> > > Instead of using a linear predictor, GenRM leverages CoT and next-token prediction to provide reward. GenRM is proven to be more accurate. https://arxiv.org/abs/2410.12832 > > > >...

[Question] How to call reward model in rule based reward function?

哥哥们，假如我想在奖励函数里调用奖励模型来生成奖励，现在最佳示范是什么啊，谢谢哥哥们

[Question] How to call reward model in rule based reward function?

Thank you, that's of great help!

[Question] How to call reward model in rule based reward function?

> 如果想用GRPO的话可以用RM吗我觉得自己写一个奖励，在奖励里调用RM应该就可以