Bug in Euclidean Attention ?
Hi, I really like this work and thank you for releasing the code.
I was trying out the Euclidean Attention and noticed what seems to be a bug in the implementation (see here). The implementation differs from the comment in the code, and I do not believe it currently computes the euclidean distance between features.
The current implementation is:
sim = q @ k.transpose(-1, -2) - 0.5 * q.sum(-1)[..., None] - 0.5 * k.sum(-1)[..., None, :]
I believe it should be:
sim = q @ k.transpose(-1, -2) - 0.5 * q.pow(2).sum(-1)[..., None] - 0.5 * k.pow(2).sum(-1)[..., None, :]
Can you comment on which implementation you used to generate the results in the paper? Thanks!
Hi, thank you for pointing out the bug! Yes, q and k should be squared.
It seems that I created that bug when I wrote the open-sourced version. I checked my original implementation, and I confirm that the paper's result of the Euclidean attention is produced with the correct implementation (below is what I actually used to compute the sim value)
sim = - 0.5 * torch.sum(qt*qt, -1).unsqueeze(-1) - 0.5 * torch.sum(kt * kt, -1).unsqueeze(-2) + qt @ kt.transpose(-1, -2)