Stage-3 code about ppo loss may have something wrong
In applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py, critic_loss and actor_loss are strangely added to each other. I am so confused about it.
@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.
@LuciusMos which dataset u used. How is labelling done for step2 & 3 ?
@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.
@yaozhewei I see it is already solved in one PR. So I would close this issue