GraphWriter icon indicating copy to clipboard operation
GraphWriter copied to clipboard

about b.out in newmodel.py

Open menggehe opened this issue 5 years ago • 3 comments

Hi, i have some questions, i don't understand why you use the b.out in your code: e = self.emb(outp).transpose(0,1) in newmodel.py Because b.out is the ground-truth, it means you use the ground-truth information in your code?

menggehe avatar Mar 30 '20 13:03 menggehe

This is a standard training technique called "teacher forcing" where the model is trained with the gold sequence prefix. You'll notice that in the inference code (beam_generate function lines 157 to 170) we do not use the gold labels, but instead feed the previous predicted token to the model.

rikdz avatar Mar 30 '20 18:03 rikdz

as I understand it, in teacher forcing we observe loss (roughly) as P(y_i) - P(\hat{y}i | y_0 ... y{i-1}) , where the y are gold tokens and the \hat{y} are model predictions. thus, each prediction is conditioned on the gold prefix sequence. It's possible to condition on the predicted prefix sequence \hat{y}0 ... \hat{y}{i-1}, but this takes longer to converge

rikdz avatar Mar 30 '20 18:03 rikdz

This is a standard training technique called "teacher forcing" where the model is trained with the gold sequence prefix. You'll notice that in the inference code (beam_generate function lines 157 to 170) we do not use the gold labels, but instead feed the previous predicted token to the model.

OK! I got it! Thank you!

menggehe avatar Mar 31 '20 03:03 menggehe