isaac_ros_common icon indicating copy to clipboard operation
isaac_ros_common copied to clipboard

run_dev.sh: using `--gpus` instead of `--runtime nvidia`

Open Interpause opened this issue 2 years ago • 3 comments

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common/blob/6d3c5c00e0e2b3fc1d75eb4286848d23b05d6dca/scripts/run_dev.sh#L195-L203

I noticed run_dev.sh's Docker container works (torch.cuda.is_available() returns True) if I replace --runtime nvidia with --gpus all. I also noticed in the dev environment setup guide (https://nvidia-isaac-ros.github.io/getting_started/dev_env_setup.html) that nvidia-container-runtime is deprecated. Is using --gpus all more suitable on newer versions of Docker?

Interpause avatar Nov 06 '23 12:11 Interpause

--gpus all should enable the same runtime behavior but need to confirm with the nvidia-container-runtime engineers. Thanks for the heads up.

hemalshahNV avatar Nov 06 '23 23:11 hemalshahNV

Has this been changed? If I use --gpus instead of --runtime in run_dev.sh, you will get an error like this.

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

YuminosukeSato avatar Nov 17 '23 00:11 YuminosukeSato

@Buddies-as-you-know , could you confirm what version of the CUDA Drivers you have installed? The missing libnvidia-ml.so.1 library should be included as part of a proper CUDA installation.

jaiveersinghNV avatar Nov 20 '23 23:11 jaiveersinghNV