Hybridizer kernels fail on V100 and P100
Samples work fine on my MSI RTX 2080 SUPER, but kernel launch fails on Azure VMs with Tesla V100 and P100. Here is an example of memcheck output for Builtin sample project:
========= CUDA-MEMCHECK
sum = 0
========= Program hit CUDA_ERROR_INVALID_SOURCE (error 300) due to "device kernel image is invalid" on CUDA API call to cuModuleLoadData.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\windows\SYSTEM32\nvcuda.dll (cuMemcpy2DAsync + 0x1ca844) [0x1d8f7b]
========= Host Frame:D:\1\Builtin_CUDA.dll (Builtinx46Programx46Run_ExternCWrapper_CUDA + 0x85) [0x1695]
========= Host Frame:[0x7ffc4e3b60a1]
=========
========= ERROR SUMMARY: 1 error
My Hybridizer version: 1.3.0 After creating Azure VM NC6s_v3 (this one has V100) or NC6s_v2 (P100) I've installed CUDA 10.1 and C++ redistributable 14.24.28127.4. OS was either Windows 10 or Windows Server 2019. I also noticed that result of kernel call is not checked in sample app. On my local box returns 0, but in Azure VMs it returns 999.
I guess the problem is that it detects and builds code for local GPU architecture. I suspect this can be configured somehow, but I cannot figure out how to do that and can't find any documentation. Changing --gpu-architecture in Hybridizer JITTER Options from "auto" to "compute_70" does not fix it, and using values like "sm_70" fails a build. That was a blind shot. Besides, I wanted to try building it in VM with Tesla GPU, but licensing service seems to be down. After clicking "Refresh" button, license settings dialog just says "Loading" and that's it. The same behavior on my original dev box.