RuntimeError: CUDA error: no kernel image is available for execution
Hello @cattaneod ,
We are facing some runtime error while doing training .
Could you please help us with it ?
: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
"please use get_last_lr().", UserWarning)
ERROR - CMRNet - Failed after 0:00:35!
Traceback (most recent calls WITHOUT Sacred internals):
File "/fs/scratch/XC_EXP_IN/CMRNet/main_visibility_CALIB.py", line 316, in main
depth_img[depth_img == 1000.] = 0.
RuntimeError: CUDA error: no kernel image is available for execution on the devi
Thanks,
Hello @cattaneod
Regarding the kernel launch failure errors:
-
The CMRNet module's setup.py lacks the same nvcc_args and cxx_args that the correlation_package has. So by adding them would help in making sure that the correct GPU architecture(s) are targeted. I think this might be the cause of the kernel launch failure errors
-
Also in the Best Practices Guide: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#building-for-maximum-compatibility, the last two line :
-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75
It builds with PTX code that can be JIT translated. In the correlation_package the lines may be need to be updated.
With these changes. Apart from the ones already discussed in other issues , its working. Other changes were:
- uv and uv2 : Need to add .contiguous() in main_visibility_CALIB.py and evaluate_iterative_single_CALIB.py
- total_trasl_error = torch.tensor(0.0, device=target_transl.device ) in main_visibility_CALIB.py