no positional information for the first self attention block

Open congwang093 opened this issue 2 years ago • 0 comments

Hi, thanks for your hard work. I read the paper and if I understand correctly, the first transformer block doesn't have any positional information. would this cause any issues for passing on information to the rest of the blocks, since the self attention modules always come the some positional information? have you tried to use any other relative positional encoding methods to fill in the gap for the first block?

Oct 25 '23 12:10 congwang093