benjpau

Results 3 comments of benjpau

Did you solve this problem yet? Thank you!

Hi! @lyzKF I had the same confusion about this part. After some research, I found that this seems to be a common practice when applying PPO to LLM reinforcement learning....