DeepSpeedExamples Stage-3 code about ppo loss may have something wrong

In applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py, critic_loss and actor_loss are strangely added to each other. I am so confused about it.

Apr 20 '23 09:04 LuciusMos

@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.

Apr 24 '23 04:04 yaozhewei

@LuciusMos which dataset u used. How is labelling done for step2 & 3 ?

May 03 '23 03:05 jaideep11061982

@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.

@yaozhewei I see it is already solved in one PR. So I would close this issue

May 04 '23 07:05 LuciusMos