attentions
attentions copied to clipboard
PyTorch implementation of some attentions for Deep Learning Researchers.
Attention-Based Models for Speech Recognition has a sharpening feature which selects a few of hidden states to attend. However, there are little details about how to select in the paper...
Hello, I think If you want the additive attention be able to deal with batch, while inputs are like these Inputs: query, value - **query** (batch_size, q_len, hidden_dim): tensor containing...
https://github.com/sooftware/attentions/blob/b1dd65be8f12fbe19525bc4ae0dbbc14975778a7/attentions.py#L286 It seems the mask is not correct. Since there is a permute of query, key, and value. The mask should also has a permute.