Implementation of Attention module in Transformer

Open LeCongThuong opened this issue 4 years ago • 1 comments

Thank you for sharing your work, it has actually been helping me a lot. I have a problem with your code relating Attention module of Transformer. May I be wrong that the Attention module should have dropout layer after softmax function (link). For example, link or link, they used dropout layer in Attention module.

Nov 14 '21 02:11 LeCongThuong

What you mentioned is indeed a common practice

Jul 29 '24 02:07 isCopyman