pycuda uses unavailable compute capabilities on older versions of CUDA with new hardware
pycuda defaults to asking nvcc to use the maximum compute capability available on the GPU. This fails if the version of CUDA doesn't support the compute capability. For instance, if you're trying to use a GTX 1080 on CUDA 7.5 you get error messages like:
ExecError: error invoking 'nvcc --preprocess -arch sm_61 -Ifile.cu --compiler-options -P': [Errno 2] No such file or directory
The solution seems to be to use the highest compute capability available in CUDA that's supported by the card, but I'm not sure the best way to do that.
You can force an arch by passing an argument to SourceModule: https://documen.tician.de/pycuda/driver.html#pycuda.compiler.SourceModule
I'd be happy to take a patch/pull request that reads an environment variable like PYCUDA_DEFAULT_JIT_ARCH. (e.g.)
Is there an easy way to determine what the maximum supported compute capability of the linked version of CUDA is? Seems like we want to use an arch which is min(max supported by CUDA, max supported by device).
Short of parsing nvcc output, I don't think so.
Having an environment variable like PYCUDA_DEFAULT_JIT_ARCH would be very useful.
For custom kernels you can indeed use the arch argument, but this is not possible for ElementWise or Reduction kernels (and I guess Parallel Scan, but I do not use them).