【MiniLLM】is it normal to get negative loss at some step?

Open lllyyyqqq opened this issue 1 year ago • 1 comments

First, Excellent work! I am trying to reproduce using my own data, and change some of your code. During the training, at some steps, I got negative rl_loss, reg_loss, pg_loss, is it a normal behaviour?

Apr 03 '24 03:04 lllyyyqqq

It seems abnormal to get negative losses.

pg_loss and reward have opposite signs (see this function), where the reward equals log p which is negative. Therefore, pg_loss should be positive.
reg_loss can be viewed as the token-level reverse KLD between the teacher model and the student model, which should be positive.
rl_loss is simply pg_loss + reg_loss

Apr 04 '24 02:04 t1101675