CMRNet icon indicating copy to clipboard operation
CMRNet copied to clipboard

RuntimeError: CUDA error: no kernel image is available for execution

Open Sabyasachi6215 opened this issue 3 years ago • 1 comments

Hello @cattaneod ,

We are facing some runtime error while doing training .

Could you please help us with it ?

: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) : UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr(). "please use get_last_lr().", UserWarning) ERROR - CMRNet - Failed after 0:00:35! Traceback (most recent calls WITHOUT Sacred internals): File "/fs/scratch/XC_EXP_IN/CMRNet/main_visibility_CALIB.py", line 316, in main depth_img[depth_img == 1000.] = 0. RuntimeError: CUDA error: no kernel image is available for execution on the devi

Thanks,

Sabyasachi6215 avatar Jun 28 '22 07:06 Sabyasachi6215

Hello @cattaneod

Regarding the kernel launch failure errors:

  1. The CMRNet module's setup.py lacks the same nvcc_args and cxx_args that the correlation_package has. So by adding them would help in making sure that the correct GPU architecture(s) are targeted. I think this might be the cause of the kernel launch failure errors

  2. Also in the Best Practices Guide: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#building-for-maximum-compatibility, the last two line :

-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75

It builds with PTX code that can be JIT translated. In the correlation_package the lines may be need to be updated.

With these changes. Apart from the ones already discussed in other issues , its working. Other changes were:

  1. uv and uv2 : Need to add .contiguous() in main_visibility_CALIB.py and evaluate_iterative_single_CALIB.py
  2. total_trasl_error = torch.tensor(0.0, device=target_transl.device ) in main_visibility_CALIB.py

RaviBeagle avatar Jun 30 '22 12:06 RaviBeagle