Sun-Shiqi issues

Repositories
Issues
Comments

Results 2 issues of


                                            Sun-Shiqi

Data

This is really a surprising work. I am very curious about the composition of training data.I would greatly appreciate it if you could share the full details of the training...

The reward value did not increase.

When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide a normal reward...