chenjc
chenjc
I want to know have you reproduced the results of transformer reported in the paper
您好,我已收到您的邮件,请您知悉。
您好,我已收到您的邮件,请您知悉。
Do you solve this problem?
@xingyuma618 Excuse me, I have similar confusion. Do you address this problem?
@xingyuma618 And do you understand why the authors use torch.sum(nll, dim=1) to calculate log-likelihood, not torch.mean(nll, dim=1). I think this is unlike the calculation method described in the article