Peter Herrera

Results 1 comments of Peter Herrera

Each query should attend to all keys and normalize over them is right, this is what softmax(dim=2) achieves in [batch, query, key, heads] format. Each query attends over all keys,...