Bug in Euclidean Attention ?

Open dmklee opened this issue 10 months ago • 1 comments

Hi, I really like this work and thank you for releasing the code.

I was trying out the Euclidean Attention and noticed what seems to be a bug in the implementation (see here). The implementation differs from the comment in the code, and I do not believe it currently computes the euclidean distance between features.

The current implementation is:

sim = q @ k.transpose(-1, -2)  - 0.5 * q.sum(-1)[..., None] - 0.5 * k.sum(-1)[..., None, :]

I believe it should be:

sim = q @ k.transpose(-1, -2)  - 0.5 * q.pow(2).sum(-1)[..., None] - 0.5 * k.pow(2).sum(-1)[..., None, :]

Can you comment on which implementation you used to generate the results in the paper? Thanks!

Apr 18 '25 16:04 dmklee

Hi, thank you for pointing out the bug! Yes, q and k should be squared.

It seems that I created that bug when I wrote the open-sourced version. I checked my original implementation, and I confirm that the paper's result of the Euclidean attention is produced with the correct implementation (below is what I actually used to compute the sim value)

sim = - 0.5 * torch.sum(qt*qt, -1).unsqueeze(-1) - 0.5 * torch.sum(kt * kt, -1).unsqueeze(-2) + qt @ kt.transpose(-1, -2)

Apr 18 '25 16:04 takerum