former
former copied to clipboard
Hi,question about sliced-up version self-attention
in the blog you says there is a more efficient way of implementation? see lecture at the top. Do you mean the youtube vide at the top?
but there is no code explaination in the video , do i have to watch the video and implement myself or any blogs about the more efficient way of self attention?
thanks a lot!
This refers to slide 25-26 in this lectures: https://dlvu.github.io/slides/dlvu.lecture12.pdf Slide 25 shows the basic idea of multi-head self attention, and 26 shows how to implement it efficiently.
This is implemented in the default self-attention here: https://github.com/pbloem/former/blob/ce7af9dce65d294d973974113f1e5e0aa45bad4e/former/modules.py#L9