pycuda issue when using arrays bigger than 17 GB
Describe the bug
It seems that pycuda is not able to use arrays bigger than 17 GB.
Allocating (with gpuarray.empty or gpuarray.zeros) works, but any subsequent operation on the array will hang (no crash).
To Reproduce
import pycuda.autoinit
import pycuda.gpuarray as garray
n_frames = 1024
dz = garray.zeros((n_frames, 2048, 2048), "f") # allocation works
dz += 1 # hangs forever
The same goes with a custom ElementwiseKernel applied on this array: the operation hangs but does not crash.
The limits seems to be 2**34 bytes, meaning that n_frames = 1023 should work in the above example.
Doing the same with a C/Cuda programm works (I can provide a source code if needed).
Tried with the following configurations
- Ubuntu 20.04, python 3.8, numpy 1.22.4, pycuda 2021.1 - Tesla A40, Cuda 11.4, driver 470.141
- Ubuntu 20.04, python 3.8, numpy 1.23.2, pycuda 2022.1 - Tesla V100, Cuda 10.1, driver 418.126
- Debian 11, python 3.9, numpy 1.22.4, pycuda 2021.1 - Quadro P6000, Cuda 11.2, driver 460.91
Perhaps it has to do with the usage of int instead of unsigned int or size_t, but it looks like pycuda already uses unsigned type at least in get_elwise_module.
I suspect it's this code snippet you're alluding to:
https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L56-L59
Have you tried changing those types to something bigger, say unsigned long?
Also here:
https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L106-L109
Thanks @inducer it seems to solve the problem. Should I do a PR ?
Do you think changing these lines is enough to fix this class of problems, i.e, are there other files I should be looking at ?
Yes, I'd be happy to consider a PR. Thanks for offering!
If you're up for it, please look over reduction.py and scan.py for related issues.