Nvidia/CUDA Docker containers nested inside LXD container fail if running the LXD container as unprivileged (OK, if privileged)
1. Issue or feature description
I am trying to run a Nvidia/CUDA Docker container from within an LXD container (so, a nested scenario). It seems, the only way to get such Nvidia Docker container working is to make the LXD container a privileged one. So, inside the privileged LXD container, the following works perfectly fine:
docker run --rm --gpus all --ipc=host nvidia/cuda:11.4.1-base-ubuntu20.04 nvidia-smi
If I run the very same LXD container as unprivileged, the nested CUDA Docker container fails with the error below. Other (non nvidia/CUDA) Docker containers work fine.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/098ad8bf1fdcf4ab72091864933fbc8b67a8f0b30746681ba6ef4082c23245b9/devices.allow: operation not permitted: unknown.
On the LXD discussion group, it was suggested to make the error as "non-fatal" in case of nested containers: https://discuss.linuxcontainers.org/t/nvidia-and-docker-in-lxd/12136
2. Steps to reproduce the issue
- create a new (by default unprivileged) LXD container with nesting allowed and Nvidia GPU attached:
lxc launch ubuntu:21.04 demo3 -c security.nesting=true -c security.syscalls.intercept.mknod=true -c security.syscalls.intercept.setxattr=true
lxc storage create docker btrfs
lxc storage volume create docker demo3
lxc config device add demo3 docker disk pool=docker source=demo3 path=/var/lib/docker
lxc restart demo3
lxc config device add demo3 gpu gpu
- install Nvidia drivers within the LXD container (this is currently required as the LXD options: security.privileged=true and nvidia.runtime=true still do not work together)
lxc exec demo3 -- bash
apt update
apt install --no-install-recommends nvidia-driver-470
- nvidia-smi inside the LXD container working fine
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 16% 27C P8 16W / 250W | 178MiB / 11175MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
- install Docker inside the LXD container:
apt install apt-transport-https ca-certificates curl gnupg lsb-release -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
- "typical" Docker containers work perfectly fine:
docker run --rm hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
- install nvidia-docker2
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=ubuntu20.04 && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
apt update
apt install nvidia-docker2 -y
- restart Docker daemon and verify Docker working fine
systemctl restart docker
docker version
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:53:57 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:06 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 20.10.8
Storage Driver: btrfs
Build Version: Btrfs v5.10.1
Library Version: 102
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e25210fe30a0a703442421b0f60afac609f950a3
runc version: v1.0.1-0-g4144b63
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.11.0-34-generic
Operating System: Ubuntu 21.04
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.75GiB
Name: demo4
ID: PTI6:4Q7T:PMWD:XC2L:5W2X:PMLV:7QRG:S3ZW:KMII:GCAY:PC7L:5P3X
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
- Nvidia/CUDA Docker containers fail in an unprivileged LXD container, e.g.:
docker run --rm --gpus all --ipc=host nvidia/cuda:11.4.1-base-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.4.1-base-ubuntu20.04' locally
11.4.1-base-ubuntu20.04: Pulling from nvidia/cuda
16ec32c2132b: Pull complete
d795373d028a: Pull complete
aa1a4de63ca7: Pull complete
99fe2b653f7a: Pull complete
151e201e5dbc: Pull complete
Digest: sha256:79b4fdc93e6e98fbb1770893b497d6528ab19cf056d15e366787135ca18b7565
Status: Downloaded newer image for nvidia/cuda:11.4.1-base-ubuntu20.04
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/333969e7089a6ca8b93c493b34741c8e17d8d6fb5acaa16031c4a8fb54814286/devices.allow: operation not permitted: unknown.
- making the LXD container privileged:
exit
lxc stop demo3
lxc config set demo3 security.privileged=true
lxc start demo3
- now the very same Nvidia Docker container runs fine within the privileged LXD container:
lxc exec demo3 -- bash
docker run --rm --gpus all --ipc=host nvidia/cuda:11.4.1-base-ubuntu20.04 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 16% 28C P8 16W / 250W | 178MiB / 11175MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
3. Information to attach (optional if deemed irrelevant)
- [X] Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0913 20:19:32.954928 591 nvc.c:372] initializing library context (version=1.5.0, build=4699c1b8b4991b6d869ea403e109291653bb040b)
I0913 20:19:32.955339 591 nvc.c:346] using root /
I0913 20:19:32.955386 591 nvc.c:347] using ldcache /etc/ld.so.cache
I0913 20:19:32.955422 591 nvc.c:348] using unprivileged user 65534:65534
I0913 20:19:32.955509 591 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0913 20:19:32.956299 591 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0913 20:19:32.956416 591 nvc.c:249] skipping kernel modules load due to user namespace
I0913 20:19:32.956870 592 driver.c:101] starting driver service
I0913 20:19:32.958866 591 nvc_info.c:750] requesting driver information with ''
I0913 20:19:32.959672 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.63.01
I0913 20:19:32.959733 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.63.01
I0913 20:19:32.959775 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.63.01
I0913 20:19:32.959809 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.63.01
I0913 20:19:32.959857 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.470.63.01
I0913 20:19:32.959900 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.63.01
I0913 20:19:32.959931 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.470.63.01
I0913 20:19:32.959965 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.63.01
I0913 20:19:32.960007 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.470.63.01
I0913 20:19:32.960053 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.63.01
I0913 20:19:32.960081 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.63.01
I0913 20:19:32.960115 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
I0913 20:19:32.960144 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.470.63.01
I0913 20:19:32.960189 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.470.63.01
I0913 20:19:32.960231 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.63.01
I0913 20:19:32.960260 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.63.01
I0913 20:19:32.960293 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.63.01
I0913 20:19:32.960335 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.470.63.01
I0913 20:19:32.960368 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.63.01
I0913 20:19:32.960414 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.470.63.01
I0913 20:19:32.960497 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
I0913 20:19:32.960557 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.470.63.01
I0913 20:19:32.960590 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.470.63.01
I0913 20:19:32.960617 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.470.63.01
I0913 20:19:32.960645 591 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.470.63.01
W0913 20:19:32.960660 591 nvc_info.c:392] missing library libnvidia-nscq.so
W0913 20:19:32.960665 591 nvc_info.c:392] missing library libnvidia-fatbinaryloader.so
W0913 20:19:32.960669 591 nvc_info.c:392] missing library libvdpau_nvidia.so
W0913 20:19:32.960674 591 nvc_info.c:396] missing compat32 library libnvidia-ml.so
W0913 20:19:32.960678 591 nvc_info.c:396] missing compat32 library libnvidia-cfg.so
W0913 20:19:32.960682 591 nvc_info.c:396] missing compat32 library libnvidia-nscq.so
W0913 20:19:32.960687 591 nvc_info.c:396] missing compat32 library libcuda.so
W0913 20:19:32.960691 591 nvc_info.c:396] missing compat32 library libnvidia-opencl.so
W0913 20:19:32.960695 591 nvc_info.c:396] missing compat32 library libnvidia-ptxjitcompiler.so
W0913 20:19:32.960700 591 nvc_info.c:396] missing compat32 library libnvidia-fatbinaryloader.so
W0913 20:19:32.960704 591 nvc_info.c:396] missing compat32 library libnvidia-allocator.so
W0913 20:19:32.960708 591 nvc_info.c:396] missing compat32 library libnvidia-compiler.so
W0913 20:19:32.960714 591 nvc_info.c:396] missing compat32 library libnvidia-ngx.so
W0913 20:19:32.960719 591 nvc_info.c:396] missing compat32 library libvdpau_nvidia.so
W0913 20:19:32.960724 591 nvc_info.c:396] missing compat32 library libnvidia-encode.so
W0913 20:19:32.960727 591 nvc_info.c:396] missing compat32 library libnvidia-opticalflow.so
W0913 20:19:32.960731 591 nvc_info.c:396] missing compat32 library libnvcuvid.so
W0913 20:19:32.960735 591 nvc_info.c:396] missing compat32 library libnvidia-eglcore.so
W0913 20:19:32.960739 591 nvc_info.c:396] missing compat32 library libnvidia-glcore.so
W0913 20:19:32.960744 591 nvc_info.c:396] missing compat32 library libnvidia-tls.so
W0913 20:19:32.960749 591 nvc_info.c:396] missing compat32 library libnvidia-glsi.so
W0913 20:19:32.960753 591 nvc_info.c:396] missing compat32 library libnvidia-fbc.so
W0913 20:19:32.960757 591 nvc_info.c:396] missing compat32 library libnvidia-ifr.so
W0913 20:19:32.960762 591 nvc_info.c:396] missing compat32 library libnvidia-rtcore.so
W0913 20:19:32.960765 591 nvc_info.c:396] missing compat32 library libnvoptix.so
W0913 20:19:32.960769 591 nvc_info.c:396] missing compat32 library libGLX_nvidia.so
W0913 20:19:32.960773 591 nvc_info.c:396] missing compat32 library libEGL_nvidia.so
W0913 20:19:32.960778 591 nvc_info.c:396] missing compat32 library libGLESv2_nvidia.so
W0913 20:19:32.960783 591 nvc_info.c:396] missing compat32 library libGLESv1_CM_nvidia.so
W0913 20:19:32.960788 591 nvc_info.c:396] missing compat32 library libnvidia-glvkspirv.so
W0913 20:19:32.960792 591 nvc_info.c:396] missing compat32 library libnvidia-cbl.so
I0913 20:19:32.961010 591 nvc_info.c:297] selecting /usr/bin/nvidia-smi
I0913 20:19:32.961030 591 nvc_info.c:297] selecting /usr/bin/nvidia-debugdump
I0913 20:19:32.961046 591 nvc_info.c:297] selecting /usr/bin/nvidia-persistenced
I0913 20:19:32.961072 591 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-control
I0913 20:19:32.961092 591 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-server
W0913 20:19:32.961139 591 nvc_info.c:418] missing binary nv-fabricmanager
I0913 20:19:32.961177 591 nvc_info.c:512] listing device /dev/nvidiactl
I0913 20:19:32.961184 591 nvc_info.c:512] listing device /dev/nvidia-uvm
I0913 20:19:32.961191 591 nvc_info.c:512] listing device /dev/nvidia-uvm-tools
I0913 20:19:32.961196 591 nvc_info.c:512] listing device /dev/nvidia-modeset
W0913 20:19:32.961223 591 nvc_info.c:342] missing ipc /var/run/nvidia-persistenced/socket
W0913 20:19:32.961247 591 nvc_info.c:342] missing ipc /var/run/nvidia-fabricmanager/socket
W0913 20:19:32.961264 591 nvc_info.c:342] missing ipc /tmp/nvidia-mps
I0913 20:19:32.961270 591 nvc_info.c:805] requesting device information with ''
I0913 20:19:32.966964 591 nvc_info.c:697] listing device /dev/nvidia0 (GPU-06986d8e-47c3-467c-c6bc-0a30ae3fbd30 at 00000000:01:00.0)
NVRM version: 470.63.01
CUDA version: 11.4
Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce GTX 1080 Ti
Brand: GeForce
GPU UUID: GPU-06986d8e-47c3-467c-c6bc-0a30ae3fbd30
Bus Location: 00000000:01:00.0
Architecture: 6.1
I0913 20:19:32.966997 591 nvc.c:423] shutting down library context
I0913 20:19:32.967215 592 driver.c:163] terminating driver service
I0913 20:19:32.967500 591 driver.c:203] driver service terminated successfully
- [X ] Kernel version from
uname -a
Linux demo4 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:22:09 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- [X] Any relevant kernel output lines from
dmesg
[ 1517.263463] docker0: port 1(vethce5fe56) entered blocking state
[ 1517.263475] docker0: port 1(vethce5fe56) entered disabled state
[ 1517.263652] device vethce5fe56 entered promiscuous mode
[ 1517.622529] docker0: port 1(vethce5fe56) entered disabled state
[ 1517.628590] device vethce5fe56 left promiscuous mode
[ 1517.628603] docker0: port 1(vethce5fe56) entered disabled state
- [X] Driver information from
nvidia-smi -a
from the host
==============NVSMI LOG==============
Timestamp : Mon Sep 13 22:23:14 2021
Driver Version : 470.63.01
CUDA Version : 11.4
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-06986d8e-47c3-467c-c6bc-0a30ae3fbd30
Minor Number : 0
VBIOS Version : 86.02.39.00.FF
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x376A1458
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 16 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11175 MiB
Used : 178 MiB
Free : 10997 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 28 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 16.48 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 375.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2037 MHz
SM : 2037 MHz
Memory : 5616 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 5026
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 167 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 5324
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 8 MiB
from the LXD container:
==============NVSMI LOG==============
Timestamp : Mon Sep 13 20:23:59 2021
Driver Version : 470.63.01
CUDA Version : 11.4
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-06986d8e-47c3-467c-c6bc-0a30ae3fbd30
Minor Number : 0
VBIOS Version : 86.02.39.00.FF
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x376A1458
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 16 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11175 MiB
Used : 178 MiB
Free : 10997 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 28 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 17.30 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 375.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2037 MHz
SM : 2037 MHz
Memory : 5616 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
- [X] Docker version from
docker version
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:53:57 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:06 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
- [X] NVIDIA packages version from
dpkg -l '*nvidia*'orrpm -qa '*nvidia*'
from the host:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-==========================-============-=========================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-470 470.63.01-0ubuntu0.21.04.2 all Shared files used by the NVIDIA libraries
un libnvidia-compute <none> <none> (no description available)
rc libnvidia-compute-460:amd64 460.73.01-0ubuntu1 amd64 NVIDIA libcompute package
rc libnvidia-compute-465:amd64 465.19.01-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-compute-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.3.3-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.3-1 amd64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVENC Video Encoding runtime library
un libnvidia-extra <none> <none> (no description available)
ii libnvidia-extra-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 Extra libraries for the NVIDIA driver
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un libnvidia-ifr1 <none> <none> (no description available)
ii libnvidia-ifr1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
un nvidia-compute-utils <none> <none> (no description available)
rc nvidia-compute-utils-460 460.73.01-0ubuntu1 amd64 NVIDIA compute utilities
rc nvidia-compute-utils-465 465.19.01-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-compute-utils-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA compute utilities
ii nvidia-container-runtime 3.4.2-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.4.2-1 amd64 NVIDIA container runtime hook
rc nvidia-dkms-460 460.73.01-0ubuntu1 amd64 NVIDIA DKMS package
rc nvidia-dkms-465 465.19.01-0ubuntu1 amd64 NVIDIA DKMS package
ii nvidia-dkms-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
rc nvidia-kernel-common-460 460.73.01-0ubuntu1 amd64 Shared files used with the kernel module
rc nvidia-kernel-common-465 465.19.01-0ubuntu1 amd64 Shared files used with the kernel module
ii nvidia-kernel-common-470 470.63.01-0ubuntu0.21.04.2 amd64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
un nvidia-kernel-source-460 <none> <none> (no description available)
un nvidia-kernel-source-465 <none> <none> (no description available)
ii nvidia-kernel-source-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA kernel source package
un nvidia-libopencl1-dev <none> <none> (no description available)
ii nvidia-modprobe 470.57.02-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.16.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 470.57.02-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA binary Xorg driver
from within the LXD container:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-==========================-============-=========================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-470 470.63.01-0ubuntu0.21.04.2 all Shared files used by the NVIDIA libraries
un libnvidia-compute <none> <none> (no description available)
ii libnvidia-compute-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.5.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.5.0-1 amd64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVENC Video Encoding runtime library
un libnvidia-extra <none> <none> (no description available)
ii libnvidia-extra-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 Extra libraries for the NVIDIA driver
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un libnvidia-ifr1 <none> <none> (no description available)
ii libnvidia-ifr1-470:amd64 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
un nvidia-compute-utils <none> <none> (no description available)
ii nvidia-compute-utils-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA compute utilities
ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.5.1-1 amd64 NVIDIA container runtime hook
ii nvidia-dkms-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
ii nvidia-kernel-common-470 470.63.01-0ubuntu0.21.04.2 amd64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-source-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA kernel source package
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
un nvidia-prime <none> <none> (no description available)
un nvidia-settings <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-470 470.63.01-0ubuntu0.21.04.2 amd64 NVIDIA binary Xorg driver
- [X] NVIDIA container library version from
nvidia-container-cli -V
version: 1.5.0
build date: 2021-09-02T08:39+00:00
build revision: 4699c1b8b4991b6d869ea403e109291653bb040b
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
- [X] NVIDIA container library logs (see troubleshooting)
cat /var/log/nvidia-container-toolkit.log
-- WARNING, the following logs are for debugging purposes only --
I0913 20:33:29.853991 1004 nvc.c:372] initializing library context (version=1.5.0, build=4699c1b8b4991b6d869ea403e109291653bb040b)
I0913 20:33:29.854198 1004 nvc.c:346] using root /
I0913 20:33:29.854244 1004 nvc.c:347] using ldcache /etc/ld.so.cache
I0913 20:33:29.854281 1004 nvc.c:348] using unprivileged user 65534:65534
I0913 20:33:29.854338 1004 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0913 20:33:29.854676 1004 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0913 20:33:29.854752 1004 nvc.c:249] skipping kernel modules load due to user namespace
I0913 20:33:29.854976 1010 driver.c:101] starting driver service
I0913 20:33:29.863469 1004 nvc_container.c:388] configuring container with 'compute utility supervised'
I0913 20:33:29.863950 1004 nvc_container.c:236] selecting /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/local/cuda-11.4/compat/libcuda.so.470.57.02
I0913 20:33:29.864131 1004 nvc_container.c:236] selecting /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/local/cuda-11.4/compat/libnvidia-ptxjitcompiler.so.470.57.02
I0913 20:33:29.864518 1004 nvc_container.c:408] setting pid to 998
I0913 20:33:29.864566 1004 nvc_container.c:409] setting rootfs to /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0
I0913 20:33:29.864619 1004 nvc_container.c:410] setting owner to 0:0
I0913 20:33:29.864656 1004 nvc_container.c:411] setting bins directory to /usr/bin
I0913 20:33:29.864693 1004 nvc_container.c:412] setting libs directory to /usr/lib/x86_64-linux-gnu
I0913 20:33:29.864728 1004 nvc_container.c:413] setting libs32 directory to /usr/lib/i386-linux-gnu
I0913 20:33:29.864764 1004 nvc_container.c:414] setting cudart directory to /usr/local/cuda
I0913 20:33:29.864800 1004 nvc_container.c:415] setting ldconfig to @/sbin/ldconfig.real (host relative)
I0913 20:33:29.864847 1004 nvc_container.c:416] setting mount namespace to /proc/998/ns/mnt
I0913 20:33:29.864883 1004 nvc_container.c:418] setting devices cgroup to /sys/fs/cgroup/devices/docker/dd7f4ee43c878e6ce63ccaba0c9b9a10d2834add60afb23ae14db0d2f90fb694
I0913 20:33:29.864928 1004 nvc_info.c:750] requesting driver information with ''
I0913 20:33:29.866900 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.63.01
I0913 20:33:29.867044 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.63.01
I0913 20:33:29.867174 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.63.01
I0913 20:33:29.867286 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.63.01
I0913 20:33:29.867431 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.470.63.01
I0913 20:33:29.867572 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.63.01
I0913 20:33:29.867698 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.470.63.01
I0913 20:33:29.867809 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.63.01
I0913 20:33:29.867957 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.470.63.01
I0913 20:33:29.868097 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.63.01
I0913 20:33:29.868202 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.63.01
I0913 20:33:29.868324 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
I0913 20:33:29.868435 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.470.63.01
I0913 20:33:29.868578 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.470.63.01
I0913 20:33:29.868720 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.63.01
I0913 20:33:29.868826 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.63.01
I0913 20:33:29.868953 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.63.01
I0913 20:33:29.869096 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.470.63.01
I0913 20:33:29.869204 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.63.01
I0913 20:33:29.869348 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.470.63.01
I0913 20:33:29.869595 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
I0913 20:33:29.869810 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.470.63.01
I0913 20:33:29.869925 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.470.63.01
I0913 20:33:29.870033 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.470.63.01
I0913 20:33:29.870168 1004 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.470.63.01
W0913 20:33:29.870258 1004 nvc_info.c:392] missing library libnvidia-nscq.so
W0913 20:33:29.870301 1004 nvc_info.c:392] missing library libnvidia-fatbinaryloader.so
W0913 20:33:29.870337 1004 nvc_info.c:392] missing library libvdpau_nvidia.so
W0913 20:33:29.870373 1004 nvc_info.c:396] missing compat32 library libnvidia-ml.so
W0913 20:33:29.870409 1004 nvc_info.c:396] missing compat32 library libnvidia-cfg.so
W0913 20:33:29.870444 1004 nvc_info.c:396] missing compat32 library libnvidia-nscq.so
W0913 20:33:29.870494 1004 nvc_info.c:396] missing compat32 library libcuda.so
W0913 20:33:29.870531 1004 nvc_info.c:396] missing compat32 library libnvidia-opencl.so
W0913 20:33:29.870567 1004 nvc_info.c:396] missing compat32 library libnvidia-ptxjitcompiler.so
W0913 20:33:29.870602 1004 nvc_info.c:396] missing compat32 library libnvidia-fatbinaryloader.so
W0913 20:33:29.870638 1004 nvc_info.c:396] missing compat32 library libnvidia-allocator.so
W0913 20:33:29.870673 1004 nvc_info.c:396] missing compat32 library libnvidia-compiler.so
W0913 20:33:29.870722 1004 nvc_info.c:396] missing compat32 library libnvidia-ngx.so
W0913 20:33:29.870758 1004 nvc_info.c:396] missing compat32 library libvdpau_nvidia.so
W0913 20:33:29.870795 1004 nvc_info.c:396] missing compat32 library libnvidia-encode.so
W0913 20:33:29.870830 1004 nvc_info.c:396] missing compat32 library libnvidia-opticalflow.so
W0913 20:33:29.870866 1004 nvc_info.c:396] missing compat32 library libnvcuvid.so
W0913 20:33:29.870902 1004 nvc_info.c:396] missing compat32 library libnvidia-eglcore.so
W0913 20:33:29.870949 1004 nvc_info.c:396] missing compat32 library libnvidia-glcore.so
W0913 20:33:29.870985 1004 nvc_info.c:396] missing compat32 library libnvidia-tls.so
W0913 20:33:29.871021 1004 nvc_info.c:396] missing compat32 library libnvidia-glsi.so
W0913 20:33:29.871057 1004 nvc_info.c:396] missing compat32 library libnvidia-fbc.so
W0913 20:33:29.871092 1004 nvc_info.c:396] missing compat32 library libnvidia-ifr.so
W0913 20:33:29.871128 1004 nvc_info.c:396] missing compat32 library libnvidia-rtcore.so
W0913 20:33:29.871176 1004 nvc_info.c:396] missing compat32 library libnvoptix.so
W0913 20:33:29.871213 1004 nvc_info.c:396] missing compat32 library libGLX_nvidia.so
W0913 20:33:29.871248 1004 nvc_info.c:396] missing compat32 library libEGL_nvidia.so
W0913 20:33:29.871283 1004 nvc_info.c:396] missing compat32 library libGLESv2_nvidia.so
W0913 20:33:29.871319 1004 nvc_info.c:396] missing compat32 library libGLESv1_CM_nvidia.so
W0913 20:33:29.871354 1004 nvc_info.c:396] missing compat32 library libnvidia-glvkspirv.so
W0913 20:33:29.871402 1004 nvc_info.c:396] missing compat32 library libnvidia-cbl.so
I0913 20:33:29.871985 1004 nvc_info.c:297] selecting /usr/bin/nvidia-smi
I0913 20:33:29.872089 1004 nvc_info.c:297] selecting /usr/bin/nvidia-debugdump
I0913 20:33:29.872170 1004 nvc_info.c:297] selecting /usr/bin/nvidia-persistenced
I0913 20:33:29.872269 1004 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-control
I0913 20:33:29.872342 1004 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-server
W0913 20:33:29.872652 1004 nvc_info.c:418] missing binary nv-fabricmanager
I0913 20:33:29.872750 1004 nvc_info.c:512] listing device /dev/nvidiactl
I0913 20:33:29.872793 1004 nvc_info.c:512] listing device /dev/nvidia-uvm
I0913 20:33:29.872829 1004 nvc_info.c:512] listing device /dev/nvidia-uvm-tools
I0913 20:33:29.872865 1004 nvc_info.c:512] listing device /dev/nvidia-modeset
W0913 20:33:29.872945 1004 nvc_info.c:342] missing ipc /var/run/nvidia-persistenced/socket
W0913 20:33:29.873024 1004 nvc_info.c:342] missing ipc /var/run/nvidia-fabricmanager/socket
W0913 20:33:29.873107 1004 nvc_info.c:342] missing ipc /tmp/nvidia-mps
I0913 20:33:29.873149 1004 nvc_info.c:805] requesting device information with ''
I0913 20:33:29.880072 1004 nvc_info.c:697] listing device /dev/nvidia0 (GPU-06986d8e-47c3-467c-c6bc-0a30ae3fbd30 at 00000000:01:00.0)
I0913 20:33:29.880343 1004 nvc_mount.c:344] mounting tmpfs at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/proc/driver/nvidia
I0913 20:33:29.881942 1004 nvc_mount.c:112] mounting /usr/bin/nvidia-smi at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/bin/nvidia-smi
I0913 20:33:29.882284 1004 nvc_mount.c:112] mounting /usr/bin/nvidia-debugdump at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/bin/nvidia-debugdump
I0913 20:33:29.882570 1004 nvc_mount.c:112] mounting /usr/bin/nvidia-persistenced at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/bin/nvidia-persistenced
I0913 20:33:29.882896 1004 nvc_mount.c:112] mounting /usr/bin/nvidia-cuda-mps-control at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/bin/nvidia-cuda-mps-control
I0913 20:33:29.883229 1004 nvc_mount.c:112] mounting /usr/bin/nvidia-cuda-mps-server at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/bin/nvidia-cuda-mps-server
I0913 20:33:29.883827 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.63.01
I0913 20:33:29.884117 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.63.01
I0913 20:33:29.884449 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
I0913 20:33:29.884795 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.63.01
I0913 20:33:29.885117 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.63.01
I0913 20:33:29.885396 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.63.01
I0913 20:33:29.885710 1004 nvc_mount.c:112] mounting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.63.01
I0913 20:33:29.885866 1004 nvc_mount.c:524] creating symlink /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0913 20:33:29.886609 1004 nvc_mount.c:63] mounting /lib/firmware/nvidia/470.63.01 at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/usr/lib/firmware/nvidia/470.63.01
I0913 20:33:29.886913 1004 nvc_mount.c:208] mounting /dev/nvidiactl at /var/lib/docker/btrfs/subvolumes/be11006c908fb293162fe6b4ded3bdacc0858a9f4f82a98372c000d5e769f6e0/dev/nvidiactl
I0913 20:33:29.887090 1004 nvc_mount.c:499] whitelisting device node 195:255
I0913 20:33:29.889227 1004 nvc.c:423] shutting down library context
I0913 20:33:29.890167 1010 driver.c:163] terminating driver service
I0913 20:33:29.891254 1004 driver.c:203] driver service terminated successfully
- [X] Docker command, image and tag used
* within the privileged LXD container:
docker run --rm --gpus all --ipc=host nvidia/cuda:11.4.1-base-ubuntu20.04 nvidia-smi
Mon Sep 13 20:41:24 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 16% 27C P8 16W / 250W | 178MiB / 11175MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
* within the unprivileged LXD container:
docker run --rm --gpus all --ipc=host nvidia/cuda:11.4.1-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/9e199f6f3e7e69766ce196d617b7e623f506c186b371fd732250ef8d1f1f0631/devices.allow: operation not permitted: unknown.```
@waldekkot did you manage to solve it somehow? Thanks!
@iegorval unfortunately, there have been no changes to how it works...
We are currently in the processing of re-architecing the nvidia-docker stack, and I'd be curious to know if this issue is resolved by the new stack.
Can you try replacing your current nvidia-container-runtime binary with the "experimental" one from here:
docker cp $(docker create --rm nvcr.io/nvidia/k8s/container-toolkit:v1.8.0-rc.2-ubuntu18.04):/work/nvidia-container-runtime.experimental .
And then invoke docker using an NVIDIA_VISIBLE_DEVICES envvar rather than the --gpus flag.
A quick update to this in case @waldekkot has moved on:
Container toolkit version v1.8.0-rc.2-ubuntu18.04 as above is now the standard install via apt if you've configured the experimental packages repo. Using that version (or the file pulled from the container above) the problem still exists as detailed. There's (slightly) more info in the error message in that it now states "failed to add device rules":
docker -D run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: failed to add device rules: write /sys/fs/cgroup/devices/docker/ca6bba3e85ac368ca5310907cbcd9b2fd404c83077323cd84b49a3b541019785/devices.allow: operation not permitted: unknown.
The error is identical whether you use ENVVARS or --gpus as arguments.
This should have been fixed in v1.8.1.
Hmm, Assuming the 1.9.0-1 release is ahead of that it's still broken there. I confess I'm struggling to debug this, obviously happy to diagnose further if anyone can point me in the correct direction:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: failed to add device rules: write /sys/fs/cgroup/devices/docker/f9352b6b081710baa40d6ba036102e79c228c063afc16ca88ef21212f02f0ad5/devices.allow: operation not permitted: unknown.
ERRO[0002] error waiting for container: context canceled
ii libnvidia-container-tools 1.9.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.9.0-1 amd64 NVIDIA container runtime library
ii nvidia-container-toolkit 1.9.0-1 amd64 NVIDIA container runtime hook
ii nvidia-docker2 2.10.0-1 all nvidia-docker CLI wrapper```
Hmm. So this works for you if you downgrade to say, libnvidia-container v1.7.0? But it's broken on the latest?
Looking more closely at the linked issue, it seems that this is failing "by design" at the moment (and would also fail on older versions of libnvidia-container not just the newest one).
That error should really be non-fatal in the case of nested containers. It may be worth filing an issue against nvidia-container to have them relax error handling on this particular case.
Unprivileged containers aren’t allowed to modify devices.allow/devices.deny but that doesn’t mean the device in question isn’t already allowed (as it is in this case).
I think what you want to do is probably just uncomment no-cgroups = true in your /etc/nvidia-container-runtime/config.toml file.
Excellent! Thank you @klueska, that fixed the issue there.
For reference nvidia-docker is still not working in unprivileged mode as above without some more work. It's necessary to set raw.apparmor values within LXC to allow access to /proc/driver/nvidia/gpus/0000:bu:s_id.0 as otherwise nvidia-container-cli fails to mount. That's very much an LXC thing rather than a nvidia-docker issue though.
Thanks again.