Hi,question about sliced-up version self-attention

Open Faith-Uchiha opened this issue 3 years ago • 1 comments

in the blog you says there is a more efficient way of implementation? see lecture at the top. Do you mean the youtube vide at the top?

but there is no code explaination in the video , do i have to watch the video and implement myself or any blogs about the more efficient way of self attention?

thanks a lot!

May 23 '22 13:05 Faith-Uchiha

This refers to slide 25-26 in this lectures: https://dlvu.github.io/slides/dlvu.lecture12.pdf Slide 25 shows the basic idea of multi-head self attention, and 26 shows how to implement it efficiently.

This is implemented in the default self-attention here: https://github.com/pbloem/former/blob/ce7af9dce65d294d973974113f1e5e0aa45bad4e/former/modules.py#L9

May 23 '22 20:05 pbloem