DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Stage-3 code about ppo loss may have something wrong

Open LuciusMos opened this issue 2 years ago • 1 comments

In applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py, critic_loss and actor_loss are strangely added to each other. I am so confused about it.

image

LuciusMos avatar Apr 20 '23 09:04 LuciusMos

@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.

yaozhewei avatar Apr 24 '23 04:04 yaozhewei

@LuciusMos which dataset u used. How is labelling done for step2 & 3 ?

jaideep11061982 avatar May 03 '23 03:05 jaideep11061982

@LuciusMos thanks a lot! It won't affect the training but indeed affect the training log. Would you like to create a PR to resolve the issue. Thank you so much.

@yaozhewei I see it is already solved in one PR. So I would close this issue

LuciusMos avatar May 04 '23 07:05 LuciusMos