accelerated-scan
accelerated-scan copied to clipboard
gates (A matrix) with a shape of batch * dim * dim * seqlen
Thank you for your excellent work!
I was wondering if it’s possible to modify your code to handle a state-space model case where the gates (A matrix) have a more general shape of batch × dim × dim × seqlen?
Thanks @WeihanLikk! How big is the dim you'd like to use?
Thanks for your reply! In my case, the maximum dim is around 100.
That sounds reasonable! I would make an entirely new kernel though, that could be simpler because the thread block layout would be different.
Thanks!