mps could not work both in container and host in the same time
Dear sirs.
I use v100 32G (440.33.01) GPU-server to provide web server and training my deep learning model.
I use docker container to deploy multi-process web-server under the nvidia-cuda-mps.
Everything work fine, when I run (follow #419):
sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d
docker run --gpus device=0 --ipc=host -e GUNICORN_WORKERS=2 --name test -it --rm -p 8285:9000 jh:v2
But after the container start to run, I can not run the python gpu-scripts in host machine anymore. The host program just stuck without any warning and fail info.
I wonder if mps could not work both in container and host in the same time? Thank you for your help!
I found the solution!
The docker container user must be the same with the Host machine user. So, you need to add "-u 1000:1000" like:
sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d
docker run --gpus device=0 --ipc=host -u 1000:1000 -e GUNICORN_WORKERS=2 --name test -it --rm -p 8285:9000 jh:v2
So, the host machine user can run the gpu-python-scripts with no stuck.
Can this be closed now?