Dan Fu
Dan Fu
This is something we're very interested in and still working on! We don't have a formula for it quite yet.
We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working...
I’ve never seen this before… maybe try bumping cuda or PyTorch versions? Torch FFT occasionally has errors since it’s rarely used. If you can get a minimal reproduction script in...