LMOps
LMOps copied to clipboard
【MiniLLM】is it normal to get negative loss at some step?
First, Excellent work! I am trying to reproduce using my own data, and change some of your code. During the training, at some steps, I got negative rl_loss, reg_loss, pg_loss, is it a normal behaviour?
It seems abnormal to get negative losses.
- pg_loss and reward have opposite signs (see this function), where the reward equals log p which is negative. Therefore, pg_loss should be positive.
- reg_loss can be viewed as the token-level reverse KLD between the teacher model and the student model, which should be positive.
- rl_loss is simply pg_loss + reg_loss