Unable to Find libnvidia-ml.so.1 When Using "docker compose linux-gpu up"
Here is the result of my command. Is this error inside the container or outside? The weird part to me is:
genai-stack-pull-model-1 | pulling ollama model llama2 using http://llm-gpu:11434
The docs told me to add that URL to the .env file. However, I certainly don't have server running there.
$ docker compose --profile linux-gpu up
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string.
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_PROJECT" variable is not set. Defaulting to a blank string.
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string.
[+] Running 4/4
✔ llm-gpu 3 layers [⣿⣿⣿] 0B/0B Pulled 1.3s
✔ aece8493d397 Already exists 0.0s
✔ 3b9196308e0f Already exists 0.0s
✔ e75cbce7870b Already exists 0.0s
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 8/8
✔ Container genai-stack-llm-gpu-1 Created 0.0s
✔ Container genai-stack-database-1 Running 0.0s
✔ Container genai-stack-pull-model-1 Recreated 0.1s
✔ Container genai-stack-api-1 Recreated 0.1s
✔ Container genai-stack-bot-1 Recreated 0.1s
✔ Container genai-stack-pdf_bot-1 Recreated 0.1s
✔ Container genai-stack-loader-1 Recreated 0.1s
✔ Container genai-stack-front-end-1 Recreated 0.1s
Attaching to genai-stack-api-1, genai-stack-bot-1, genai-stack-database-1, genai-stack-front-end-1, genai-stack-llm-gpu-1, genai-stack-loader-1, genai-stack-pdf_bot-1, genai-stack-pull-model-1
genai-stack-pull-model-1 | pulling ollama model llama2 using http://llm-gpu:11434
genai-stack-pull-model-1 | Error: Head "http://llm-gpu:11434/": dial tcp 172.18.0.4:11434: connect: no route to host
genai-stack-pull-model-1 exited with code 1
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
The docs told me to add that URL to the .env file. However, I certainly don't have server running there.
If the container genai-stack-llm-gpu-1 is running, then you have a server running at http://llm-gpu:11434/ internally to Docker.
What seems here to be the issue here is your Nvidia runtime integration with Docker.
Are you able to run this command successfully?
docker run -it --rm --gpus all ubuntu nvidia-smi
If not try to reinstall Docker.
@matthieuml , I've faced the same issue and tried the command you've proposed. The error in the result is the same as I see when run GenAI stack with --profile linux-gpu, namely:
#…
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
I have also followed your advice from #62 (installed nvidia-container-toolkit) but nothing has changed.
The main hint here seems that I run the stack in Docker Desktop 4.26.1 (on Ubuntu 23.10). The nvidia-smi displays the following about the GPU:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 11W / 55W | 628MiB / 6144MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
Some reported issues that I've found in the net so far suggest to use Docker CE instead of Docker Desktop. But it looks opposite to what GenAI stack promotes — the easy and developer-friendly way to build LLM-powered applications.
Is there other way to resolve the issue?
After looking a bit around, it seems that nvidia-container-toolkit needs docker-ce installed as root to work (which isn't the case with Docker Desktop?).
The obvious way to resolve this issue would be to use docker-ce installed as root or even podman as an alternative. The Docker CLI is well documented and in combination with docker-compose you can deploy the stack quite easily.
However, if you want to keep a developer-friendly UI, maybe you could use portainer-ce in combination with docker-ce as root?
@matthieuml ,
which isn't the case with Docker Desktop?
Yes, this seems to be the root cause.
Ok, I'll switch to Docker CE.
Perhaps, it's worth adding a note about Docker Desktop incompatibility with linux-gpu profile to the README.md as well as mentioning the necessity to install nvidia-container-toolkit.
Thank you!
The Issue is in like in system package
I am using cog.yaml file in order to install system dependencies, I have tried different versions of nvidia drivers and cuda versions I am getting the same error. Is there any way to install that package in system level
@Toparvion where you able to run the docker?? having the same issue so. could you help me?
@suveerudayashankara Can you please share your host machine operating system and version of docker and docker engine?
@suveerudayashankara If you are using linux, please also install support packages in docker for nvidia drivers, I hope that solves your issue.
@Toparvion where you able to run the docker?? having the same issue so. could you help me?
@suveerudayashankara , yes, I followed the above advice to switch from Docker Desktop to Docker CE (+Portainer) and it worked for me.