FastTransforms.jl icon indicating copy to clipboard operation
FastTransforms.jl copied to clipboard

GPU compatibility

Open maximilian-gelbrecht opened this issue 4 years ago • 3 comments

I wonder if it would be realistic and/or a goal to make the library GPU compatible. With that I mean only the crucial part of applying a pre-computed plan to a CUDA / CU Array.

This is probably a bit tricky in the c library. While the FFTW parts could probably be bound to the appropiate CUDA implementations (there is cuFFT), it would need adjustments for the other plans. Personally I have no experience with CUDA in C, but in Julia a bit and looked at the old pure version of the SH plans and it seemed at least plausible that this would be doable there, but maybe I also overlooked something.

maximilian-gelbrecht avatar Apr 29 '21 17:04 maximilian-gelbrecht

I think this would be interesting. As CUDA dropped support for macOS, that stymies my involvement. (I guess I would by default be in favour of a different language like OpenCL or Metal.)

The C library uses real-to-real ffts, which are not supported in CuFFT (nor MKL for that matter!) (https://forums.developer.nvidia.com/t/newbie-to-cufft-how-to-do-real-to-real-transforms/69952), but those are simply graceful for the programmer and workarounds could be found.

I also think that GPU computations are best performed synchronously with the same amount of work across all threads, which is not always the case here.

FYI, SHTns is supposed to work on the GPU https://bitbucket.org/nschaeff/shtns/src/master/

MikaelSlevinsky avatar Apr 30 '21 03:04 MikaelSlevinsky

Thanks for the quick response. I think the argument for GPU computations here is also not only the speed up itself (which seems to be there, when I look at the SHTns benchmark, thank you for the link), but also when the transform is used in high-dimensional PDEs or ML models that can profit from GPUs massively. The overhead of transferring memory back and forward between the two memories would probably be quite costly.

CUDA seems to be the best supported GPU API for Julia that was why I was assuming it. (I am also developing on macOS normally but luckily I have access to a HPC with some nVidia cards).

I'll definitely will keep an eye open for it.

maximilian-gelbrecht avatar Apr 30 '21 07:04 maximilian-gelbrecht

so if there is a pure julia implementation, that could easily be put on a GPU to get big gains for e.g. fft

AshtonSBradley avatar May 11 '23 04:05 AshtonSBradley