benjpau comments

Repositories
Issues
Comments

Results 3 comments of


                                            benjpau

The result of the paper cannot be reproduced

Same problem

The result of the paper cannot be reproduced

Did you solve this problem yet? Thank you!

how to understand the code for calculating rewards

Hi! @lyzKF I had the same confusion about this part. After some research, I found that this seems to be a common practice when applying PPO to LLM reinforcement learning....