benjpau
Results
3
comments of
benjpau
Same problem
Did you solve this problem yet? Thank you!
Hi! @lyzKF I had the same confusion about this part. After some research, I found that this seems to be a common practice when applying PPO to LLM reinforcement learning....