Albert Gu

Results 202 comments of Albert Gu

I'm not sure it's worth using keops or the custom CUDA kernel for a pedagogical implementation. FYI, you should be able to do well on CIFAR with much smaller models....

I tried using this codebase in the past for SC09 unconditional generation and found that it does not work. An alternative implementation of DiffWave at philsyn/diffwave-unconditional did work. I've released...

We just 0-pad the inputs. Actually, it might be better if you mask out the inputs inside every S4 layer. The following snippet is from the latest model, which you...

> I got a worse output when I 0-padded the inputs as you mentioned in NER task. Worse performance compared to what? What's the alternative to 0-padding? > And if...

Right, they should be identical for a unidirectional model. Are you sure the lengths tensor is passed in correctly?

I have no plans at the moment, but I know there are other groups doing BERT-style pretraining with S4. 1. I'm unable to help here without knowing more details. It...

0. The naive algorithm and custom kernel primarily differ in memory usage. The naive algorithm materializes the Cauchy matrix which needs O(NL) ops and O(NL) space, while the custom kernel...

Thanks for pointing this out! I found out of date references in 3 places, let me know if there are any more. I plan to let this latest release out...

I looked into this recently and also found the same issue, which wasn't present before. I wasn't able to figure out why. It's weird that it happens randomly. Regardless, the...

I apologize for replying so late; this is an appropriate place to ask questions, but I spent several weeks/months releasing the new preprints and the corresponding V3 of this codebase....