Jaeman Son

Results 2 comments of Jaeman Son

I think there is a difference between math written in comments and code. The main difference is that math do linear layer with (attn*context) and concat it with output whereas...

@lethienhoa Yes, there need to be updated about NLLLoss norm term. But I am also confused why loss is not divided in terms of norm_term before doing loss.backward()? https://github.com/IBM/pytorch-seq2seq/blob/f146087a9a271e9b50f46561e090324764b081fb/seq2seq/trainer/supervised_trainer.py#L63