accelerated-scan
accelerated-scan copied to clipboard
Log-space version
Feng et al. proposed a log-space implementation of parallel scan for improved numerical stability. It should be fairly easy to implement, but I'm a bit out of practice with my CUDA skills and wanted to ask whether you don't already have it on your mind by any chance before I attempt an implementation by myself.
Hi @kklemon! I did not need to make cuda implementations of log space scans because I usually was able to stabilize the recurrence and use addition instead of logsumexp/logadd.