pycuda pycuda issue when using arrays bigger than 17 GB

Describe the bug

It seems that pycuda is not able to use arrays bigger than 17 GB. Allocating (with gpuarray.empty or gpuarray.zeros) works, but any subsequent operation on the array will hang (no crash).

To Reproduce

import pycuda.autoinit
import pycuda.gpuarray as garray
n_frames = 1024
dz = garray.zeros((n_frames, 2048, 2048), "f") # allocation works 
dz += 1 # hangs forever

The same goes with a custom ElementwiseKernel applied on this array: the operation hangs but does not crash.

The limits seems to be 2**34 bytes, meaning that n_frames = 1023 should work in the above example.

Doing the same with a C/Cuda programm works (I can provide a source code if needed).

Tried with the following configurations

Ubuntu 20.04, python 3.8, numpy 1.22.4, pycuda 2021.1 - Tesla A40, Cuda 11.4, driver 470.141
Ubuntu 20.04, python 3.8, numpy 1.23.2, pycuda 2022.1 - Tesla V100, Cuda 10.1, driver 418.126
Debian 11, python 3.9, numpy 1.22.4, pycuda 2021.1 - Quadro P6000, Cuda 11.2, driver 460.91

Perhaps it has to do with the usage of int instead of unsigned int or size_t, but it looks like pycuda already uses unsigned type at least in get_elwise_module.

Aug 26 '22 09:08 pierrepaleo

I suspect it's this code snippet you're alluding to:

https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L56-L59

Have you tried changing those types to something bigger, say unsigned long?

Also here:

https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L106-L109

Aug 26 '22 18:08 inducer

Thanks @inducer it seems to solve the problem. Should I do a PR ?

Do you think changing these lines is enough to fix this class of problems, i.e, are there other files I should be looking at ?

Sep 13 '22 12:09 pierrepaleo

Yes, I'd be happy to consider a PR. Thanks for offering!

If you're up for it, please look over reduction.py and scan.py for related issues.

Sep 13 '22 20:09 inducer