train_reward_model loss is random

Open ipackhu opened this issue 2 years ago • 1 comments

use default rm_static dataset, set train_data to 75000 batch_size 4 machine: 8 A100 80g after 13hours train loss is random change the training seems no problem

base model: opt-iml-max-1.3b

why?

Feb 25 '23 00:02 ipackhu

Thank you for your feedback. We do not suggest to use loss to eval the training process in rm training task. It's shown in paper that the loss will be 0.4~0.7. We will update evaluating with acc & distance of pro-neg-pairs soon.

Mar 02 '23 09:03 ht-zhou

We have updated a lot. This issue was closed due to inactivity. Thanks.

Apr 26 '23 07:04 binmakeswell