DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

The reward value did not increase.

Open Sun-Shiqi opened this issue 1 year ago • 1 comments

When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide a normal reward curve .

Sun-Shiqi avatar Apr 02 '24 02:04 Sun-Shiqi

下载

this is my reward curve

Sun-Shiqi avatar Apr 02 '24 02:04 Sun-Shiqi