Peter Herrera
Results
1
comments of
Peter Herrera
Each query should attend to all keys and normalize over them is right, this is what softmax(dim=2) achieves in [batch, query, key, heads] format. Each query attends over all keys,...