LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

【MiniLLM】is it normal to get negative loss at some step?

Open lllyyyqqq opened this issue 1 year ago • 1 comments

First, Excellent work! I am trying to reproduce using my own data, and change some of your code. During the training, at some steps, I got negative rl_loss, reg_loss, pg_loss, is it a normal behaviour?

lllyyyqqq avatar Apr 03 '24 03:04 lllyyyqqq

It seems abnormal to get negative losses.

  • pg_loss and reward have opposite signs (see this function), where the reward equals log p which is negative. Therefore, pg_loss should be positive.
  • reg_loss can be viewed as the token-level reverse KLD between the teacher model and the student model, which should be positive.
  • rl_loss is simply pg_loss + reg_loss

t1101675 avatar Apr 04 '24 02:04 t1101675