nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

nvml error: insufficient permissions

Open ethanabrooks opened this issue 4 years ago • 2 comments

1. Issue or feature description

Running nvidia-smi in nvidia-docker raises an error.

2. Steps to reproduce the issue

❯ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown.

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info: https://gist.github.com/ethanabrooks/bc2d9cc3c3fb61e0b188b18bcb2fe15e
  • [x] Kernel version from uname -a: Linux rldl8 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • [x] Any relevant kernel output lines from dmesg: https://gist.github.com/ethanabrooks/a2b1d485499fb724e32a89e8f3ed218a
  • [x] Driver information from nvidia-smi -a:
Wed Jul 21 19:00:09 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:02:00.0 Off |                  N/A |
| 29%   16C    P8    10W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:03:00.0 Off |                  N/A |
| 29%   14C    P8     7W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:82:00.0 Off |                  N/A |
| 29%   15C    P8     7W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:83:00.0 Off |                  N/A |
| 29%   15C    P8     8W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
  • [x] Docker version from docker version:
Client: Docker Engine - Community
Version:           20.10.6
API version:       1.41
Go version:        go1.13.15
Git commit:        370c289
Built:             Fri Apr  9 22:46:01 2021
OS/Arch:           linux/amd64
Context:           default
Experimental:      true

Server: Docker Engine - Community
Engine:
 Version:          20.10.6
 API version:      1.41 (minimum version 1.12)
 Go version:       go1.13.15
 Git commit:       8728dd2
 Built:            Fri Apr  9 22:44:13 2021
 OS/Arch:          linux/amd64
 Experimental:     false
containerd:
 Version:          1.4.4
 GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
 Version:          1.0.0-rc93
 GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
 Version:          0.19.0
 GitCommit:        de40ad0
  • [x] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*':
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                   Version                  Architecture             Description
+++-======================================-========================-========================-==================================================================================
un  libgldispatch0-nvidia                  <none>                   <none>                   (no description available)
ii  libnvidia-cfg1-470:amd64               470.57.02-0ubuntu1       amd64                    NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                     <none>                   <none>                   (no description available)
un  libnvidia-common                       <none>                   <none>                   (no description available)
ii  libnvidia-common-470                   470.57.02-0ubuntu1       all                      Shared files used by the NVIDIA libraries
ii  libnvidia-compute-470:amd64            470.57.02-0ubuntu1       amd64                    NVIDIA libcompute package
ii  libnvidia-container-tools              1.4.0-1                  amd64                    NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64             1.4.0-1                  amd64                    NVIDIA container runtime library
un  libnvidia-decode                       <none>                   <none>                   (no description available)
ii  libnvidia-decode-470:amd64             470.57.02-0ubuntu1       amd64                    NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                       <none>                   <none>                   (no description available)
ii  libnvidia-encode-470:amd64             470.57.02-0ubuntu1       amd64                    NVENC Video Encoding runtime library
un  libnvidia-extra                        <none>                   <none>                   (no description available)
ii  libnvidia-extra-470:amd64              470.57.02-0ubuntu1       amd64                    Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                         <none>                   <none>                   (no description available)
ii  libnvidia-fbc1-470:amd64               470.57.02-0ubuntu1       amd64                    NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                           <none>                   <none>                   (no description available)
ii  libnvidia-gl-470:amd64                 470.57.02-0ubuntu1       amd64                    NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                         <none>                   <none>                   (no description available)
ii  libnvidia-ifr1-470:amd64               470.57.02-0ubuntu1       amd64                    NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                          <none>                   <none>                   (no description available)
un  nvidia-304                             <none>                   <none>                   (no description available)
un  nvidia-340                             <none>                   <none>                   (no description available)
un  nvidia-384                             <none>                   <none>                   (no description available)
un  nvidia-390                             <none>                   <none>                   (no description available)
un  nvidia-common                          <none>                   <none>                   (no description available)
ii  nvidia-compute-utils-470               470.57.02-0ubuntu1       amd64                    NVIDIA compute utilities
un  nvidia-container-runtime               <none>                   <none>                   (no description available)
un  nvidia-container-runtime-hook          <none>                   <none>                   (no description available)
ii  nvidia-container-toolkit               1.5.1-1                  amd64                    NVIDIA container runtime hook
ii  nvidia-dkms-470                        470.57.02-0ubuntu1       amd64                    NVIDIA DKMS package
un  nvidia-dkms-kernel                     <none>                   <none>                   (no description available)
ii  nvidia-driver-470                      470.57.02-0ubuntu1       amd64                    NVIDIA driver metapackage
un  nvidia-driver-binary                   <none>                   <none>                   (no description available)
un  nvidia-kernel-common                   <none>                   <none>                   (no description available)
ii  nvidia-kernel-common-470               470.57.02-0ubuntu1       amd64                    Shared files used with the kernel module
un  nvidia-kernel-source                   <none>                   <none>                   (no description available)
ii  nvidia-kernel-source-470               470.57.02-0ubuntu1       amd64                    NVIDIA kernel source package
un  nvidia-legacy-340xx-vdpau-driver       <none>                   <none>                   (no description available)
un  nvidia-libopencl1-dev                  <none>                   <none>                   (no description available)
ii  nvidia-modprobe                        470.57.02-0ubuntu1       amd64                    Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                      <none>                   <none>                   (no description available)
un  nvidia-persistenced                    <none>                   <none>                   (no description available)
ii  nvidia-prime                           0.8.16~0.18.04.1         all                      Tools to enable NVIDIA's Prime
ii  nvidia-settings                        470.57.02-0ubuntu1       amd64                    Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                 <none>                   <none>                   (no description available)
un  nvidia-smi                             <none>                   <none>                   (no description available)
un  nvidia-utils                           <none>                   <none>                   (no description available)
ii  nvidia-utils-470                       470.57.02-0ubuntu1       amd64                    NVIDIA driver support binaries
un  nvidia-vdpau-driver                    <none>                   <none>                   (no description available)
ii  xserver-xorg-video-nvidia-470          470.57.02-0ubuntu1       amd64                    NVIDIA binary Xorg driver
  • [x] NVIDIA container library version from nvidia-container-cli -V:
version: 1.4.0
build date: 2021-04-24T14:25+00:00
build revision: 704a698b7a0ceec07a48e56c37365c741718c2df
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
I0721 22:54:19.158082 3729 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0721 22:54:19.158192 3729 nvc.c:346] using root /
I0721 22:54:19.158207 3729 nvc.c:347] using ldcache /etc/ld.so.cache
I0721 22:54:19.158220 3729 nvc.c:348] using unprivileged user 65534:65534
I0721 22:54:19.158255 3729 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0721 22:54:19.158488 3729 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
I0721 22:54:19.164111 3736 nvc.c:274] loading kernel module nvidia
I0721 22:54:19.164436 3736 nvc.c:278] running mknod for /dev/nvidiactl
I0721 22:54:19.164516 3736 nvc.c:282] running mknod for /dev/nvidia0
I0721 22:54:19.164572 3736 nvc.c:282] running mknod for /dev/nvidia1
I0721 22:54:19.164624 3736 nvc.c:282] running mknod for /dev/nvidia2
I0721 22:54:19.164675 3736 nvc.c:282] running mknod for /dev/nvidia3
I0721 22:54:19.164727 3736 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I0721 22:54:19.176785 3736 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0721 22:54:19.177158 3736 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0721 22:54:19.184903 3736 nvc.c:292] loading kernel module nvidia_uvm
I0721 22:54:19.184969 3736 nvc.c:296] running mknod for /dev/nvidia-uvm
I0721 22:54:19.185072 3736 nvc.c:301] loading kernel module nvidia_modeset
I0721 22:54:19.185123 3736 nvc.c:305] running mknod for /dev/nvidia-modeset
I0721 22:54:19.185489 3737 driver.c:101] starting driver service
I0721 22:54:19.190228 3729 driver.c:203] driver service terminated with signal 15

-- WARNING, the following logs are for debugging purposes only --

I0721 22:55:58.102241 4055 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0721 22:55:58.102351 4055 nvc.c:346] using root /
I0721 22:55:58.102370 4055 nvc.c:347] using ldcache /etc/ld.so.cache
I0721 22:55:58.102386 4055 nvc.c:348] using unprivileged user 65534:65534
I0721 22:55:58.102428 4055 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0721 22:55:58.102674 4055 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
I0721 22:55:58.107939 4063 nvc.c:274] loading kernel module nvidia
I0721 22:55:58.108213 4063 nvc.c:278] running mknod for /dev/nvidiactl
I0721 22:55:58.108301 4063 nvc.c:282] running mknod for /dev/nvidia0
I0721 22:55:58.108361 4063 nvc.c:282] running mknod for /dev/nvidia1
I0721 22:55:58.108418 4063 nvc.c:282] running mknod for /dev/nvidia2
I0721 22:55:58.108475 4063 nvc.c:282] running mknod for /dev/nvidia3
I0721 22:55:58.108530 4063 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I0721 22:55:58.120627 4063 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0721 22:55:58.120952 4063 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0721 22:55:58.129024 4063 nvc.c:292] loading kernel module nvidia_uvm
I0721 22:55:58.129091 4063 nvc.c:296] running mknod for /dev/nvidia-uvm
I0721 22:55:58.129212 4063 nvc.c:301] loading kernel module nvidia_modeset
I0721 22:55:58.129266 4063 nvc.c:305] running mknod for /dev/nvidia-modeset
I0721 22:55:58.129659 4064 driver.c:101] starting driver service
I0721 22:55:58.134859 4055 driver.c:203] driver service terminated with signal 15

ethanabrooks avatar Jul 21 '21 23:07 ethanabrooks

#1547 same issue with you

zacario-li avatar Sep 17 '21 01:09 zacario-li

Solved it #1547

zacario-li avatar Sep 22 '21 10:09 zacario-li