Siyuan Zhu

Results 5 comments of Siyuan Zhu

> > > Instead of using a linear predictor, GenRM leverages CoT and next-token prediction to provide reward. GenRM is proven to be more accurate. https://arxiv.org/abs/2410.12832 > > > >...

哥哥们,假如我想在奖励函数里调用奖励模型来生成奖励,现在最佳示范是什么啊,谢谢哥哥们

> 如果想用GRPO的话可以用RM吗 我觉得自己写一个奖励,在奖励里调用RM应该就可以