ColossalAI
ColossalAI copied to clipboard
train_reward_model loss is random
use default rm_static dataset, set train_data to 75000 batch_size 4 machine: 8 A100 80g after 13hours train loss is random change the training seems no problem
base model: opt-iml-max-1.3b

why?
Thank you for your feedback. We do not suggest to use loss to eval the training process in rm training task. It's shown in paper that the loss will be 0.4~0.7. We will update evaluating with acc & distance of pro-neg-pairs soon.
We have updated a lot. This issue was closed due to inactivity. Thanks.