[BUG] GPU not working when running as root
To Reproduce
I install everything required to configure GPU as per https://gist.github.com/tomlankhorst/33da3c4b9edbde5c83fc1244f010815c and https://github.com/Dokploy/dokploy/issues/816
Including manually updating the service
docker service update \
--replicas 1 \
--mount-add type=bind,source=/usr/bin/nvidia-container-runtime,target=/usr/bin/nvidia-container-runtime,readonly \
--mount-add type=bind,source=/etc/docker/daemon.json,target=/etc/docker/daemon.json,readonly \
--mount-add type=bind,source=/etc/dokploy,target=/etc/dokploy \
--mount-add type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
--mount-add type=volume,source=dokploy-docker-config,target=/root/.docker \
--publish-add published=3000,target=3000,mode=host \
--update-parallelism 1 \
--update-order stop-first \
--constraint-add 'node.role == manager' \
--generic-resource-add "gpu=1" \
--env-add ADVERTISE_ADDR=$MYIP \
--env-add NVIDIA_VISIBLE_DEVICES=all \
dokploy
(Without it none of the options showed in green even thou I was already running docker containers manually with GPU)
Current vs. Expected behavior
The current behavior is error that the GPU can't be configured
Migration complete
Setting up cron jobs....
Server Started: 3000
Starting Deployment Worker
GPU Setup Error: Error: Failed to configure GPU support. Please ensure you have sudo privileges and try again.
at f (.next/server/chunks/153.js:346:53)
at async l (.next/server/chunks/153.js:339:11127)
at async (.next/server/chunks/1463.js:4:9463)
But I'm running all commands as root user and even tried to update the service to run as root
docker service update
--user root
dokploy
I've checked that it got updated correctly
root@Ubuntu-2204-jammy-amd64-base ~ # docker service inspect hq --pretty
ID: hqjrklhp1bh546k8hup9o6aoo
Name: dokploy
Service Mode: Replicated
Replicas: 1
UpdateStatus:
State: completed
Started: 6 minutes ago
Completed: 5 minutes ago
Message: update completed
Placement:
Constraints: [node.role == manager node.role == manager]
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
Image: dokploy/dokploy:latest@sha256:7ce688a60fd5ff1d582e27003327d15685082291747f19984c4125e1aaff72de
Env: ADVERTISE_ADDR=HEREISMYIP NVIDIA_VISIBLE_DEVICES=all
Init: false
User: root
Mounts:
Target: /etc/docker/daemon.json
Source: /etc/docker/daemon.json
ReadOnly: true
Type: bind
Target: /etc/dokploy
Source: /etc/dokploy
ReadOnly: false
Type: bind
Target: /usr/bin/nvidia-container-runtime
Source: /usr/bin/nvidia-container-runtime
ReadOnly: true
Type: bind
Target: /var/run/docker.sock
Source: /var/run/docker.sock
ReadOnly: false
Type: bind
Target: /root/.docker
Source: dokploy-docker-config
ReadOnly: false
Type: volume
Resources:
Networks: dokploy-network
Endpoint Mode: vip
Ports:
PublishedPort = 3000
Protocol = tcp
TargetPort = 3000
PublishMode = host
I expected GPU to be enabled successfully
Provide environment information
Ubuntu 22.04
Which area(s) are affected? (Select all that apply)
Application
Are you deploying the applications where Dokploy is installed or on a remote server?
Same server where Dokploy is installed
Additional context
Hetzner GPU server
Will you send a PR to fix it?
No
@kikoncuo Install NVIDIA Drivers and NVIDIA Container Toolkit then you check the UI it will detect and show green. use refresh button to fetch the updates. note:
- swarm GPU support will not enable based on manual configuration. it might not detect what was configured from your end.
- But it doesn't mean you can't deploy GPU based apps just try your services if that's working if its not then your manual config is not correct. check both of these docs https://gist.github.com/tomlankhorst/33da3c4b9edbde5c83fc1244f010815c https://gist.github.com/coltonbh/374c415517dbeb4a6aa92f462b9eb287
I suggest not to wait for the UI to detect your manual config, try out your services
Ok no response and no one else has experience the same issue we can close this issue for now.
@kikoncuo same issue. Could your please tell me how to fix it ?
cat /etc/nvidia-container-runtime/config.toml
swarm-resource = "DOCKER_RESOURCE_GPU"
cat /etc/docker/daemon.json
{
"data-root": "/data/docker-data",
"default-runtime": "nvidia",
"node-generic-resources": [
"gpu=GPU-13efa0bb-0379-f661-b81f-38e2b3c005c0",
"gpu=GPU-6d5cfac8-cd1c-53ee-c314-9d5bb877cb28",
"gpu=GPU-26751515-5e3e-fa16-024e-ee26f754eae5",
"gpu=GPU-c4b0146e-7914-0a81-0f9d-faf480945d13",
"gpu=GPU-a9c877b3-3a30-af26-a499-f90888ce83ef"
],
"registry-mirrors": [
"https://docker.m.daocloud.io"
],
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
docker service create \
--name dokploy \
--replicas 1 \
--network dokploy-network \
--mount type=bind,source=/usr/bin/nvidia-container-runtime,target=/usr/bin/nvidia-container-runtime,readonly \
--mount type=bind,source=/etc/docker/daemon.json,target=/etc/docker/daemon.json,readonly \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
--mount type=bind,source=/etc/dokploy,target=/etc/dokploy \
--mount type=bind,source=/data/dokploy/dokploy-config,target=/root/.docker \
--publish published=30000,target=3000,mode=host \
--update-parallelism 1 \
--update-order stop-first \
--constraint 'node.role == manager' \
--generic-resource "gpu=1" \
-e ADVERTISE_ADDR=$advertise_addr \
-e NVIDIA_VISIBLE_DEVICES=all \
dokploy/dokploy:v0.24.12
run apt install sudo in dokploy container can not fix GPU Setup Error: Error: Failed to configure GPU support. Please ensure you have sudo privileges and try again.
@vishalkadam47 Could you please help me fix this?
Although I have drivers installed and nvidia-smi works fine, I can not make it running:
# docker service update --generic-resource-add "gpu=1" --env-add NVIDIA_VISIBLE_DEVICES=all np1owfg4eh5h
overall progress: 0 out of 1 tasks
1/1: no suitable node (insufficient resources on 1 node)
docker service ps np1owfg4eh5h
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
gpbtpahmqncf dokploy.1 dokploy/dokploy:latest Running Pending 26 seconds ago "no suitable node (insufficient resources on 1 node)"
Although I have drivers installed and nvidia-smi works fine, I can not make it running
Ok, I've solved part of this problem:
- using:
nvidia-smi -L | awk -F'UUID: ' '{print $2}' | awk -F')' '{print $1}'
returns GPU-xxy-zz-yy-aa-bb.
- then:
nano /etc/docker/daemon.json:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"default-runtime": "nvidia",
"node-generic-resources": [
"GPU=GPU-xxy-zz-yy-aa-bb" <--- HERE
]
}
- and then
systemctl restart docker
Now # docker service update --env-add NVIDIA_VISIBLE_DEVICES=all dokploy gives:
dokploy
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service dokploy converged
and:
But still:
-
NVIDIA Container Runtimeis not visible althoughswarm-resource = "DOCKER_RESOURCE_GPU"is enabled in/etc/nvidia-container-runtime/config.toml -
Swarm GPU Supportis missing - and
# docker service update --generic-resource-add "gpu=1" dokploygives
dokploy
overall progress: 0 out of 1 tasks
1/1: no suitable node (insufficient resources on 1 node)
Ok, next update:
# docker service update --mount-add type=bind,source=/usr/bin/nvidia-container-runtime,target=/usr/bin/nvidia-container-runtime,readonly dokploy
gives
dokploy
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service dokploy converged
and finally I have the same result as @sheiy:
but still Swarm GPU Support is missing and docker service update --generic-resource-add "gpu=1" dokploy gives:
dokploy
overall progress: 0 out of 1 tasks
1/1: no suitable node (insufficient resources on 1 node)
and when I try to enable GPU - I get exactly the same error as @sheiy - docker service logs dokploy:
dokploy.1 | GPU Setup Error: Error: Failed to configure GPU support. Please ensure you have sudo privileges and try again.
dokploy.1 | at f (.next/server/chunks/8512.js:8:50)
dokploy.1 | at async l (.next/server/chunks/8512.js:1:2650)
dokploy.1 | at async (.next/server/chunks/1515.js:9:23924)
dokploy.1 | severity_local: 'NOTICE',
dokploy.1 | severity: 'NOTICE',
dokploy.1 | code: '42P07',
dokploy.1 | message: 'relation "__drizzle_migrations" already exists, skipping',
dokploy.1 | file: 'parse_utilcmd.c',
dokploy.1 | line: '207',
dokploy.1 | routine: 'transformCreateStmt'
dokploy.1 | }
for single gpu working
/etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"node-generic-resources": [
"GPU=1"
]
}
do you have it green?
Okay, so just ran into this and keep circling back on it over a few weeks. Finally got heads down on it and found a couple things that seem to be unique to dokploy and swarm.
- The docker daemon needs the generic resource to be set to "gpu=1". All lowercase. This is what dokploy is set to look for. I tried the default NVIDIA-GPU and GPU and neither showed up in dokploy until I made that change.
- If nvidia server and container drivers are installed and test passes, you need to update the dokploy docker service with the right with the nvidia runtime, daemon, and env.
I used GPT to help put together a troubleshooting guide using everything I did to get it working. Up to you on if you want to use the generic ID or full UUID, I went with generic. Hope this helps!
π§ Dokploy GPU Setup (Ubuntu 24.04 LTS + Docker Swarm)
This guide walks through enabling GPU support in Dokploy using Docker Swarm on Ubuntu 24.04 LTS and deploying Ollama as a GPU-enabled app.
βοΈ Step 1 β Check GPU + Driver
lspci | grep -i nvidia
nvidia-smi
β
Confirm your GPU appears and nvidia-smi outputs normal driver/CUDA information.
π§© Step 2 β Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
π§° Step 3 β Make NVIDIA the Default Docker Runtime
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker
Verify:
docker info | grep -A2 "Runtimes"
# Should show: Runtimes: β¦ nvidia β¦ and Default Runtime: nvidia
π§Ύ Step 4 β Confirm Runtime Config
Check /etc/nvidia-container-runtime/config.toml includes:
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-runtime]
log-level = "info"
[nvidia-container-cli]
ldconfig = "@/sbin/ldconfig.real"
π‘ Step 5 β Advertise the GPU to Docker Swarm
Edit /etc/docker/daemon.json:
{
"default-runtime": "nvidia",
"runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } },
"node-generic-resources": [ "gpu=1" ]
}
Restart Docker:
sudo systemctl restart docker
π§ͺ Step 6 β Verify GPU Access with Plain Docker
docker run --rm --gpus all nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smi
β
You should see the familiar nvidia-smi output table.
π Step 7 β Initialize Docker Swarm and Label GPU Node
docker swarm init # only once
docker node update --label-add gpu=true $(hostname)
π§« Step 8 β Run a Swarm GPU Test
docker service create --name gpu-test \
--generic-resource "gpu=1" \
--constraint 'node.labels.gpu==true' \
--restart-condition none \
nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smi
docker service logs -f gpu-test
β The GPU info should print successfully inside the logs.
π₯ Step 9 β Make Dokploy Detect the GPU
Give the running Dokploy service visibility into the GPU runtime:
docker service update dokploy \
--mount-add type=bind,source=/usr/bin/nvidia-container-runtime,target=/usr/bin/nvidia-container-runtime,readonly \
--mount-add type=bind,source=/etc/docker/daemon.json,target=/etc/docker/daemon.json,readonly \
--env-add NVIDIA_VISIBLE_DEVICES=all \
--generic-resource-add "gpu=1"
Then go to Dokploy β Server GPU Setup β Refresh. All checks should turn π’ green.
π Step 10 β Deploy Ollama with GPU (Example Stack)
This follows Dokployβs [Compose conventions](https://docs.dokploy.com/docs/core/docker-compose):
using ../files for persistent storage and running under Stack mode.
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434" # Ollama API
volumes:
# Dokploy-managed persistent data path
- "../files/ollama:/root/.ollama"
environment:
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
# Optional: OLLAMA_HOST: "0.0.0.0:11434"
deploy:
replicas: 1
placement:
constraints:
- node.labels.gpu==true # schedule on GPU node
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: gpu
value: 1 # matches daemon.json "gpu=1"
restart_policy:
condition: on-failure
update_config:
parallelism: 1
order: start-first
Deploy this as a Stack inside Dokploy β your Ollama service will have GPU acceleration and persistent model storage under ../files/ollama.
Hopefully your looks something like this now:
π§ Quick Troubleshooting
| Problem | Fix | |
|---|---|---|
| Dokploy still shows red | Ensure gpu (not NVIDIA-GPU) in daemon.json; restart Docker; rerun Dokploy update |
|
| Swarm service wonβt schedule | Check `docker node inspect self -f '{{json .Description.Resources.GenericResources}}' | jq .β verify it lists"gpu": 1` |
| No logs from Swarm service | Use docker service logs, not docker logs |
|
| GPU test fails | Restart Docker, recheck runtime (docker info), rerun test container |
Can this be added to the dokploy docs
THANKS It works !! (i modified the runtime for 550 and earlier CUDA version, but I managed to make it work with GTX 1050 thx! (with PCI passtrough on proxmox) @QuinnGT
For me "gpu=1" did the trick (it works well aside GPU with UUID):
nano /etc/docker/daemon.json:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"default-runtime": "nvidia",
"node-generic-resources": [
"GPU=GPU-xxy-zz-yy-aa-bb",
"gpu=1"
]
}
Thanks @vovka93 & @QuinnGT
@QuinnGT thanks for the Troubleshooting steps you shared, I'll add few more checks in the UI soon and ability to detect manual configuration.
Okay, so just ran into this and keep circling back on it over a few weeks. Finally got heads down on it and found a couple things that seem to be unique to dokploy and swarm.
- The docker daemon needs the generic resource to be set to "gpu=1". All lowercase. This is what dokploy is set to look for. I tried the default NVIDIA-GPU and GPU and neither showed up in dokploy until I made that change.
- If nvidia server and container drivers are installed and test passes, you need to update the dokploy docker service with the right with the nvidia runtime, daemon, and env.
I used GPT to help put together a troubleshooting guide using everything I did to get it working. Up to you on if you want to use the generic ID or full UUID, I went with generic. Hope this helps!
π§ Dokploy GPU Setup (Ubuntu 24.04 LTS + Docker Swarm)
This guide walks through enabling GPU support in Dokploy using Docker Swarm on Ubuntu 24.04 LTS and deploying Ollama as a GPU-enabled app.
βοΈ Step 1 β Check GPU + Driver
lspci | grep -i nvidia nvidia-smi β Confirm your GPU appears and
nvidia-smioutputs normal driver/CUDA information.π§© Step 2 β Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt-get update sudo apt-get install -y nvidia-container-toolkit
π§° Step 3 β Make NVIDIA the Default Docker Runtime
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default sudo systemctl restart docker Verify:
docker info | grep -A2 "Runtimes"
Should show: Runtimes: β¦ nvidia β¦ and Default Runtime: nvidia
π§Ύ Step 4 β Confirm Runtime Config
Check
/etc/nvidia-container-runtime/config.tomlincludes:disable-require = false supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video" swarm-resource = "DOCKER_RESOURCE_GPU" [nvidia-container-runtime] log-level = "info" [nvidia-container-cli] ldconfig = "@/sbin/ldconfig.real"π‘ Step 5 β Advertise the GPU to Docker Swarm
Edit
/etc/docker/daemon.json:{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "node-generic-resources": [ "gpu=1" ] } Restart Docker:
sudo systemctl restart docker
π§ͺ Step 6 β Verify GPU Access with Plain Docker
docker run --rm --gpus all nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smi β You should see the familiar
nvidia-smioutput table.π Step 7 β Initialize Docker Swarm and Label GPU Node
docker swarm init # only once docker node update --label-add gpu=true $(hostname)
π§« Step 8 β Run a Swarm GPU Test
docker service create --name gpu-test
--generic-resource "gpu=1"
--constraint 'node.labels.gpu==true'
--restart-condition none
nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smidocker service logs -f gpu-test β The GPU info should print successfully inside the logs.
π₯ Step 9 β Make Dokploy Detect the GPU
Give the running Dokploy service visibility into the GPU runtime:
docker service update dokploy
--mount-add type=bind,source=/usr/bin/nvidia-container-runtime,target=/usr/bin/nvidia-container-runtime,readonly
--mount-add type=bind,source=/etc/docker/daemon.json,target=/etc/docker/daemon.json,readonly
--env-add NVIDIA_VISIBLE_DEVICES=all
--generic-resource-add "gpu=1" Then go to Dokploy β Server GPU Setup β Refresh. All checks should turn π’ green.π Step 10 β Deploy Ollama with GPU (Example Stack)
This follows Dokployβs [Compose conventions](https://docs.dokploy.com/docs/core/docker-compose): using
../filesfor persistent storage and running under Stack mode.version: "3.9"
services: ollama: image: ollama/ollama:latest ports: - "11434:11434" # Ollama API volumes: # Dokploy-managed persistent data path - "../files/ollama:/root/.ollama" environment: NVIDIA_VISIBLE_DEVICES: all NVIDIA_DRIVER_CAPABILITIES: compute,utility # Optional: OLLAMA_HOST: "0.0.0.0:11434" deploy: replicas: 1 placement: constraints: - node.labels.gpu==true # schedule on GPU node resources: reservations: generic_resources: - discrete_resource_spec: kind: gpu value: 1 # matches daemon.json "gpu=1" restart_policy: condition: on-failure update_config: parallelism: 1 order: start-first Deploy this as a Stack inside Dokploy β your Ollama service will have GPU acceleration and persistent model storage under
../files/ollama.Hopefully your looks something like this now:
## π§ Quick Troubleshooting Problem Fix Dokploy still shows red Ensure `gpu` (not `NVIDIA-GPU`) in `daemon.json`; restart Docker; rerun Dokploy update Swarm service wonβt schedule Check `docker node inspect self -f '{{json .Description.Resources.GenericResources}}' jq .`β verify it lists`"gpu": 1` No logs from Swarm service Use `docker service logs`, **not** `docker logs` GPU test fails Restart Docker, recheck runtime (`docker info`), rerun test container
@Siumauricio please add this in dokploy docs, I'll update the code soon with more proper checks and add detection for manual changes. @QuinnGT thanks
## π§ Quick Troubleshooting
Problem Fix
Dokploy still shows red Ensure `gpu` (not `NVIDIA-GPU`) in `daemon.json`; restart Docker; rerun Dokploy update
Swarm service wonβt schedule Check `docker node inspect self -f '{{json .Description.Resources.GenericResources}}' jq .`β verify it lists`"gpu": 1`
No logs from Swarm service Use `docker service logs`, **not** `docker logs`
GPU test fails Restart Docker, recheck runtime (`docker info`), rerun test container