about b.out in newmodel.py
Hi, i have some questions, i don't understand why you use the b.out in your code: e = self.emb(outp).transpose(0,1) in newmodel.py Because b.out is the ground-truth, it means you use the ground-truth information in your code?
This is a standard training technique called "teacher forcing" where the model is trained with the gold sequence prefix. You'll notice that in the inference code (beam_generate function lines 157 to 170) we do not use the gold labels, but instead feed the previous predicted token to the model.
as I understand it, in teacher forcing we observe loss (roughly) as P(y_i) - P(\hat{y}i | y_0 ... y{i-1}) , where the y are gold tokens and the \hat{y} are model predictions. thus, each prediction is conditioned on the gold prefix sequence. It's possible to condition on the predicted prefix sequence \hat{y}0 ... \hat{y}{i-1}, but this takes longer to converge
This is a standard training technique called "teacher forcing" where the model is trained with the gold sequence prefix. You'll notice that in the inference code (beam_generate function lines 157 to 170) we do not use the gold labels, but instead feed the previous predicted token to the model.
OK! I got it! Thank you!