A problem with loss computation.

Open yxdr opened this issue 6 years ago • 1 comments

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)

The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.

Dec 07 '19 05:12 yxdr

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)

The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.

so ,how to write the loss?

Jun 21 '21 07:06 fengxin619