pycuda pycuda uses unavailable compute capabilities on older versions of CUDA with new hardware

pycuda defaults to asking nvcc to use the maximum compute capability available on the GPU. This fails if the version of CUDA doesn't support the compute capability. For instance, if you're trying to use a GTX 1080 on CUDA 7.5 you get error messages like:

ExecError: error invoking 'nvcc --preprocess -arch sm_61 -Ifile.cu --compiler-options -P': [Errno 2] No such file or directory

The solution seems to be to use the highest compute capability available in CUDA that's supported by the card, but I'm not sure the best way to do that.

Feb 08 '17 00:02 mbrubake

You can force an arch by passing an argument to SourceModule: https://documen.tician.de/pycuda/driver.html#pycuda.compiler.SourceModule

I'd be happy to take a patch/pull request that reads an environment variable like PYCUDA_DEFAULT_JIT_ARCH. (e.g.)

Feb 08 '17 02:02 inducer

Is there an easy way to determine what the maximum supported compute capability of the linked version of CUDA is? Seems like we want to use an arch which is min(max supported by CUDA, max supported by device).

Feb 08 '17 16:02 mbrubake

Short of parsing nvcc output, I don't think so.

Feb 08 '17 16:02 inducer

Having an environment variable like PYCUDA_DEFAULT_JIT_ARCH would be very useful.

For custom kernels you can indeed use the arch argument, but this is not possible for ElementWise or Reduction kernels (and I guess Parallel Scan, but I do not use them).

Dec 12 '17 09:12 vincefn