nvidia-docker mps could not work both in container and host in the same time

Dear sirs. I use v100 32G (440.33.01) GPU-server to provide web server and training my deep learning model. I use docker container to deploy multi-process web-server under the nvidia-cuda-mps. Everything work fine, when I run (follow #419): sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d docker run --gpus device=0 --ipc=host -e GUNICORN_WORKERS=2 --name test -it --rm -p 8285:9000 jh:v2 But after the container start to run, I can not run the python gpu-scripts in host machine anymore. The host program just stuck without any warning and fail info.

I wonder if mps could not work both in container and host in the same time? Thank you for your help!

Aug 29 '21 10:08 archwolf118

I found the solution! The docker container user must be the same with the Host machine user. So, you need to add "-u 1000:1000" like: sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d docker run --gpus device=0 --ipc=host -u 1000:1000 -e GUNICORN_WORKERS=2 --name test -it --rm -p 8285:9000 jh:v2 So, the host machine user can run the gpu-python-scripts with no stuck.

Aug 30 '21 00:08 archwolf118

Can this be closed now?

Sep 02 '21 13:09 klueska