nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: function not found: unknown.

Open aiaiai-czh opened this issue 4 years ago • 28 comments

info docker run --gpus all --rm debian:10-slim nvidia-smi Unable to find image 'debian:10-slim' locally 10-slim: Pulling from library/debian f7ec5a41d630: Pull complete Digest: sha256:b586cf8c850cada85a47599f08eb34ede4a7c473551fd7c68cbf20ce5f8dbbf1 Status: Downloaded newer image for debian:10-slim docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting containvidia-container-cli: detection error: nvml error: function not found: unknown.

nvidia-smi Wed Apr 21 11:26:56 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.134 Driver Version: 367.134 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID K2 Off | 0000:1A:00.0 Off | Off | | N/A 46C P0 44W / 117W | 0MiB / 4033MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GRID K2 Off | 0000:1B:00.0 Off | Off | | N/A 42C P0 42W / 117W | 0MiB / 4033MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GRID K2 Off | 0000:B1:00.0 Off | Off | | N/A 46C P0 45W / 117W | 0MiB / 4033MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GRID K2 Off | 0000:B2:00.0 Off | Off | | N/A 42C P0 39W / 117W | 0MiB / 4033MiB | 1% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

nvidia-container-cli -V version: 1.3.3 build date: 2021-02-05T13:29+00:00 build revision: bd9fc3f2b642345301cb2e23de07ec5386232317 build compiler: gcc-5 5.4.0 20160609 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

uname -a Linux xidian-S2600WFT 4.15.0-142-generic #146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0421 03:29:14.518588 15027 nvc.c:372] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317) I0421 03:29:14.518653 15027 nvc.c:346] using root / I0421 03:29:14.518667 15027 nvc.c:347] using ldcache /etc/ld.so.cache I0421 03:29:14.518678 15027 nvc.c:348] using unprivileged user 1000:1000 I0421 03:29:14.518740 15027 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0421 03:29:14.518880 15027 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment W0421 03:29:14.528026 15028 nvc.c:269] failed to set inheritable capabilities W0421 03:29:14.528133 15028 nvc.c:270] skipping kernel modules load due to failure I0421 03:29:14.528714 15029 driver.c:101] starting driver service I0421 03:29:16.399164 15027 nvc_info.c:680] requesting driver information with '' nvidia-container-cli: detection error: nvml error: function not found I0421 03:29:16.399574 15027 nvc.c:427] shutting down library context I0421 03:29:16.882944 15029 driver.c:156] terminating driver service I0421 03:29:16.883197 15027 driver.c:196] driver service terminated successfully

docker --version Docker version 20.10.6, build 370c289

aiaiai-czh avatar Apr 21 '21 03:04 aiaiai-czh

Hi @aiaiai-czh

It seems as if the driver version on the host is quite old and that one of the NVML functions that we use to determine which devices are installed on the system is not available for that driver version. Would you be able to update the host driver?

elezar avatar Apr 21 '21 14:04 elezar

Hi, my gpu is nvidia grid k2, and the driver of it is image,which is latest version...

aiaiai-czh avatar Apr 21 '21 14:04 aiaiai-czh

Same issue. Attempting to use the CUDA driver on WSL 2.

kellerbaum avatar May 11 '21 15:05 kellerbaum

Same here. Exact same error message and practically identical output from diagnostics.

tonymattheys avatar May 14 '21 01:05 tonymattheys

Same here. Bump

qiangxinglin avatar May 17 '21 12:05 qiangxinglin

Unfortunately the driver version of 367.134 is older than our supported minimum version of 418.81.07.

elezar avatar May 17 '21 13:05 elezar

Unfortunately the driver version of 367.134 is older than our supported minimum version of 418.81.07.

I think you mis-understand the problem, I guess we all have problem with docker on Windows with WSL2. The version of the kernel and nvidia-docker2 are the latest (just follow the user guide). So I have no idea why the version here is 367 rather than something updated. (e.g. 465 or 470)

qiangxinglin avatar May 17 '21 13:05 qiangxinglin

@qiangxinglin This thread's issue is from real Linux and has nothing to do with WSL2. For a similar issue in WSL2 look at #1496

onomatopellan avatar May 17 '21 13:05 onomatopellan

@aiaiai-czh posted output that showed the use of the 367 driver used in conjunction with NVIDIA GRID K2 devices. No mention was made of WSL as part of the original report.

@qiangxinglin, if your issue and that of @kellerbaum and that of @tonymattheys is due to issues under WSL this may be related to a known issue as discussed in #1496

elezar avatar May 17 '21 13:05 elezar

Thanks @onomatopellan for also providing the link.

elezar avatar May 17 '21 13:05 elezar

Docker desktop version 3.3.0 works correctly. I accidentally updated to 3.3.3 (haven't checked intermediate versions) - and got the same error.

I uninstalled the latest 3.3.3 - reverted to 3.3.0 - and now it works again. I am definitely not updating again :)

gurvesh avatar May 19 '21 06:05 gurvesh

Docker desktop version 3.3.0 works correctly. I accidentally updated to 3.3.3 (haven't checked intermediate versions) - and got the same error.

I uninstalled the latest 3.3.3 - reverted to 3.3.0 - and now it works again. I am definitely not updating again :)

Thanks a ton for the downgrade tip! It work for me as well. Here is a link to past releases: https://docs.docker.com/docker-for-mac/release-notes/#docker-desktop-330

cbess avatar May 20 '21 01:05 cbess

Getting same exact issue.

  1. Latest Windows Inside Dev Channel Build 21387.1 (Windows 10 Home)

  2. Latest NVIDIA Driver $ nvidia-smi.exe Sun May 23 09:33:06 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.14 Driver Version: 470.14 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 On | N/A | | 39% 39C P8 12W / 180W | 1088MiB / 8192MiB | 2% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 932 C+G ...wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 1348 C+G Insufficient Permissions N/A | | 0 N/A N/A 3108 C+G ...bbwe\Microsoft.Photos.exe N/A | | 0 N/A N/A 6436 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 6532 C+G ...in7x64\steamwebhelper.exe N/A | | 0 N/A N/A 8412 C+G ...b3d8bbwe\WinStore.App.exe N/A | | 0 N/A N/A 8960 C+G ...artMenuExperienceHost.exe N/A | | 0 N/A N/A 8988 C+G ...5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 9848 C+G ...ekyb3d8bbwe\YourPhone.exe N/A | | 0 N/A N/A 10820 C+G ...cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 12204 C+G ...perience\NVIDIA Share.exe N/A | | 0 N/A N/A 13800 C+G ...me\Application\chrome.exe N/A | | 0 N/A N/A 14192 C+G Insufficient Permissions N/A | | 0 N/A N/A 14592 C+G ...ON Tools Lite\DTAgent.exe N/A | | 0 N/A N/A 15828 C+G ...y\ShellExperienceHost.exe N/A | | 0 N/A N/A 17876 C+G ...5n1h2txyewy\SearchApp.exe N/A | +-----------------------------------------------------------------------------+

  3. Not using Docker Desktop but using nvidia-docker2 installed inside of the WSL2 Ubuntu 20.04 distro (followed this guide https://weblogs.asp.net/dixin/setup-and-use-cuda-and-tensorflow-in-windows-subsystem-for-linux-2)

  4. Cuda samples comp[iling correctly and running utilizing GPU $ ./concurrentKernels [./concurrentKernels] - Starting... GPU Device 0: "Pascal" with compute capability 6.1

Detected Compute SM 6.1 hardware with 20 multi-processors Expected time for serial execution of 8 kernels = 0.080s Expected time for concurrent execution of 8 kernels = 0.010s Measured time for sample = 0.011s Test passed

  1. Exact error: $ docker run -it --gpus all -p 8889:8888 tensorflow/tensorflow:latest-gpu-jupyter

Unable to find image 'tensorflow/tensorflow:latest-gpu-jupyter' locally latest-gpu-jupyter: Pulling from tensorflow/tensorflow 6e0aa5e7af40: Pull complete d47239a868b3: Pull complete 49cbb10cca85: Pull complete 4450dd082e0f: Pull complete 629fc5fa5e16: Pull complete 70ab367c7b71: Pull complete 958f536b8e20: Pull complete ce4ddd54cd82: Pull complete 42e4800e18af: Pull complete e74268b08545: Pull complete 2c5fb9465126: Pull complete a113115fe9db: Pull complete dcd6c77ca4f1: Pull complete 888e5acf85d5: Pull complete 0284cc2a81ed: Pull complete f9956769a9b0: Pull complete ee99a9f145ae: Pull complete a19aa5960d84: Pull complete 890a01759293: Pull complete 8be79e9b80f5: Pull complete 4191db676d38: Pull complete d354bf619b5f: Pull complete 2c9335ed06cb: Pull complete a89ba501d7c4: Pull complete 2b846a0aef82: Pull complete ab726351486a: Pull complete 3320194b9b59: Pull complete 70acfd9b6f8d: Pull complete 9fa6985c496a: Pull complete 066a34d6bb64: Pull complete ca75e23e0d10: Pull complete Digest: sha256:43211fc57f947601e335d3bd479316a53582f9b818308fd39dad3b90e17f92e3 Status: Downloaded newer image for tensorflow/tensorflow:latest-gpu-jupyter docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

sumandeepb avatar May 23 '21 04:05 sumandeepb

@sumandeepb - Its a docker version issue. Run "docker --version" and if you see anything higher than 20.10.5, this error will be there.

Its just easier now to install Docker Desktop 3.3.0 (not any higher). See this:

https://forums.developer.nvidia.com/t/guide-to-run-cuda-wsl-docker-with-latest-versions-21382-windows-build-470-14-nvidia

gurvesh avatar May 23 '21 04:05 gurvesh

Resolved by downgrading:

$ sudo apt install nvidia-docker2:amd64=2.5.0-1 \
           libnvidia-container-tools:amd64=1.3.3-1 \
           nvidia-container-runtime:amd64=3.4.2-1 \
           libnvidia-container1:amd64=1.3.3-1 \
           nvidia-container-toolkit:amd64=1.4.2-1

as suggested here:

https://github.com/NVIDIA/nvidia-docker/issues/1496

sumandeepb avatar May 23 '21 04:05 sumandeepb

@sumandeepb - Its a docker version issue. Run "docker --version" and if you see anything higher than 20.10.5, this error will be there.

Its just easier now to install Docker Desktop 3.3.0 (not any higher). See this:

https://forums.developer.nvidia.com/t/guide-to-run-cuda-wsl-docker-with-latest-versions-21382-windows-build-470-14-nvidia

I am not using Docker desktop for Windows.

Using docker natively installed inside WSL2 Ubunti 20.04 distro

sumandeepb avatar May 23 '21 05:05 sumandeepb

@sumandeepb - I understand. I was just suggesting you no longer need to install docker as per the complicated steps in the nvidia guide, and destop docker works pretty good actually, and also allows you to run the same containers (with gpu support) from windows directly.

In any case, as you found, its a docker version issue. I hope they sort it out!

gurvesh avatar May 23 '21 05:05 gurvesh

@gurvesh - Yes, definietely docker issue. Latest versions are equally affecting Docker Desktop (> 3.3.0 does not work) as well as nvidia-docker2 (> 2.5.0-1 does not work) Linux installs.

Thanks for the article link. Right now I will keep what is working, but later will surely try and see if Docker desktop also works out for me.

sumandeepb avatar May 23 '21 05:05 sumandeepb

I've also encountered this error, using Docker Desktop for Windows 10. And for some reason, downgrading to 3.3.0 hasn't changed the error at all. nvidia-smi gives me proper output (SMI v 470.14, CUDA v 11.3), and docker -v gives me "20.10.5, build 55c4c88"

sordidlist avatar Jun 01 '21 07:06 sordidlist

@imprint-extract For what reason? Did you reboot your pc after downgrading?

qiangxinglin avatar Jun 01 '21 07:06 qiangxinglin

Yes, I rebooted after downgrading and also tried deleting and recreating the image.

sordidlist avatar Jun 01 '21 07:06 sordidlist

Weird. I switched from linux containers to windows containers, and then back to linux containers, and now the container seems to run.

sordidlist avatar Jun 01 '21 16:06 sordidlist

I've also encountered this error, using Docker Desktop for Windows 10. And for some reason, downgrading to 3.3.0 hasn't changed the error at all. nvidia-smi gives me proper output (SMI v 470.14, CUDA v 11.3), and docker -v gives me "20.10.5, build 55c4c88"

I am also facing the same issue with this setting. No luck yet.

johnny12150 avatar Jun 02 '21 17:06 johnny12150

I am facing the issue again. Can we have a good solution to resolve this issue. I have tried reinstall nvidia driver and docker. It does not resolve the issue.

docker version:20.10.2 NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2

nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: timed out: unknown.

jzhang82119 avatar Jul 15 '21 19:07 jzhang82119

@jzhang82119 I am having the same issue as you. I think we can only wait for an update.

TheMarshalMole avatar Jul 23 '21 17:07 TheMarshalMole

@jzhang82119 I also have exactly the same issue.

docker version: Docker version 20.10.5, build 55c4c88

LiyuanHsu avatar Jul 24 '21 19:07 LiyuanHsu

Resolved by downgrading:

$ sudo apt install nvidia-docker2:amd64=2.5.0-1 \
           libnvidia-container-tools:amd64=1.3.3-1 \
           nvidia-container-runtime:amd64=3.4.2-1 \
           libnvidia-container1:amd64=1.3.3-1 \
           nvidia-container-toolkit:amd64=1.4.2-1

as suggested here:

#1496

Fixed the issue. My compliments.

rdslater avatar Aug 17 '21 03:08 rdslater

I had the same problem. Updating Docker Desktop to 4.1.1 solved the problem.

pawelseweryn avatar Oct 31 '21 12:10 pawelseweryn