Torchserve 0.8.1: ONNX GPU models not working
🐛 Describe the bug
I recently updated the torchserve version from 0.7.1-gpu to 0.8.1-gpu.
Current setup
I used torchserve:0.7.1-gpu from the source and build a docker image with torch2.0+cpu. The onnx GPU models and were running and the models used ~8.5 Memory and ~4GB GPU (cuda 11.7).
Bug
- After a recent update to
torchserve 0.8.1thetorch 2.0+cpuno longer worked and failed with the following error:
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
- I managed to fix this by using
torch2.0with gpu dependencies but doing so increased the Memory(~13GB ) and GPU (~6GB) consumption.
The models were not updated. I built torchserve0.8.1 with ./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38
Error logs
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
Installation instructions
Yes I ran ./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38
FROM torchserve_py38
RUN pip install torch==2.0 (for cpu version --extra-index-url https://download.pytorch.org/whl/cpu)
RUN pip install onnxruntime-gpu==1.13.1
...
...
Model Packaing
I converted pytorch models to onnx and served them using Custom Handler.
config.properties
No response
Versions
Working
Torchserve branch:
torchserve==0.7.1b20230208
torch-model-archiver==0.7.1b20230208
Python version: 3.8 (64-bit runtime)
Python executable: /usr/bin/python
Versions of relevant python libraries:
captum==0.6.0
intel-extension-for-pytorch==1.13.0
numpy==1.22.2
nvgpu==0.9.0
psutil==5.6.7
pygit2==1.11.1
pylint==2.6.0
pytest==7.2.1
pytest-cov==4.0.0
pytest-mock==3.10.0
requests==2.31.0
requests-toolbelt==0.10.1
sentencepiece==0.1.97
torch==2.0.0+cpu
torch-model-archiver==0.7.1b20230208
torch-workflow-archiver==0.2.7b20230208
torchaudio==0.13.1+cu117
torchserve==0.7.1b20230208
torchtext==0.14.1
torchvision==0.15.1+cpu
transformers==4.10.0
wheel==0.40.0
torch==2.0.0+cpu
torchtext==0.14.1
torchvision==0.15.1+cpu
torchaudio==0.13.1+cu117
Java Version:
OS: Ubuntu 20.04.5 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.26.4
Bug
-----------------------------------------------------------------------------------------
Torchserve branch:
torchserve==0.8.1
torch-model-archiver==0.8.1
Python version: 3.8 (64-bit runtime)
Python executable: /home/venv/bin/python
Versions of relevant python libraries:
captum==0.6.0
numpy==1.22.2
nvgpu==0.10.0
psutil==5.6.7
requests==2.31.0
sentencepiece==0.1.97
torch==2.0.0
torch-model-archiver==0.8.1
torch-workflow-archiver==0.2.9
torchaudio==2.0.2+cu117
torchdata==0.6.1
torchserve==0.8.1
torchtext==0.15.2+cpu
torchvision==0.15.1
transformers==4.10.0
wheel==0.40.0
torch==2.0.0
torchtext==0.15.2+cpu
torchvision==0.15.1
torchaudio==2.0.2+cu117
Java Version:
OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.26.4
Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: Tesla T4
Nvidia driver version: 470.161.03
cuDNN version: None
Repro instructions
Issue 1
- Convert a PyTorch model to onnx and run it with
python3.8,onnxruntime-gpu==1.13.1,torchserve0.7.1-gpuandtorch2.0.0+cpu. (Take note of GPU, Memory consumption) - Build a new torchserve image with this command
./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38. Run the same model with (onnxruntime-gpu==1.13.1andtorch2.0.0+cpu).
Issue 2
- Use
torch2.0instead oftorch 2.0+cputhe memory and GPU consumption will increase.
Possible Solution
No response
So the logic for whether to use the ONNX CUDA environment is controlled here
https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L80-L84
Both on CPU and GPU our tests were passing so I don't suspect there's a bug with how this is set
Bug 1
What's confusing me about your setup is using both onnx gpu runtime and torch cpu?
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
It seems to me like what's happening is in your handler your map_location is cuda so make sure that's not not the case https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L112
Bug 2
Are you using the same dependencies for onnx and onnx runtime when measuring the extra memory overhead? torch 2.0 in general has way more dependencies but the overhead you're seeing is significant
I suspect you should be able to repro your errors without torchserve in the loop which will make debugging this a bit easier. Let me know if this all makes sense
Hello @msaroufim ,
Following your comments, I simplified the requirements and the Docker file to make sure there's nothing wrong with my setup.
Bug 1
I don't understand why I should set the map_location=None when I want my model to use CUDA.
Summary: With the updated setup I still have the error. (Failed to create CUDAExecutionProvider)
My test setup is available here: https://github.com/dt-subaandh-krishnakumar/pytorch_issue. I attached the Logs, Dockerfiles, GPU info. This issue occurs in all onnx models (This is the one used for testing https://huggingface.co/docs/transformers/serialization).
Possible Solution
During my test I observed the following libraries are missing in torchserve 0.8.1 docker image.
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
Bug 2
I believe if the Bug 1 is fixed this won't be a problem as I don't need to install torch 2.0 which has lot of other dependencies.
Please let me know if you need any more information.
Possible Solution During my test I observed that in torchserve 0.8.1 the following libraries are missing in torchserve 0.8.1 docker image.
This is interesting, tagging @agunapal since a similar issue came up with deepspeed - was not aware ONNX depends on all of this. In that case you can check if your issue goes away if you build a new docker image with the nvidia runtime like so https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143
Possible Solution During my test I observed that in torchserve 0.8.1 the following libraries are missing in torchserve 0.8.1 docker image.
This is interesting, tagging @agunapal since a similar issue came up with deepspeed - was not aware ONNX depends on all of this. In that case you can check if your issue goes away if you build a new docker image with the nvidia runtime like so https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143
There were some errors with this https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143
In my case, this worked
docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 --build-arg PYTHON_VERSION=3.8 -t torchserve:0.8.1 .
but I have a new issue the metrics API returns nothing. Any idea why?
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/nvidia/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ts/metrics/metric_collector.py", line 27, in <module>
system_metrics.collect_all(sys.modules['ts.metrics.system_metrics'], arguments.gpu)
File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 119, in collect_all
value(num_of_gpu)
File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 90, in gpu_utilization
statuses = list_gpus.device_statuses()
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in device_statuses
return [device_status(device_index) for device_index in range(device_count)]
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in <listcomp>
return [device_status(device_index) for device_index in range(device_count)]
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 19, in device_status
nv_procs = nv.nvmlDeviceGetComputeRunningProcesses(handle)
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2608, in nvmlDeviceGetComputeRunningProcesses
return nvmlDeviceGetComputeRunningProcesses_v3(handle);
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2576, in nvmlDeviceGetComputeRunningProcesses_v3
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v3")
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
2023-06-27T14:49:12,149 [ERROR] Thread-1 org.pytorch.serve.metrics.MetricCollector - Traceback (most recent call last):
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/nvidia/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ts/metrics/metric_collector.py", line 27, in <module>
system_metrics.collect_all(sys.modules['ts.metrics.system_metrics'], arguments.gpu)
File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 119, in collect_all
value(num_of_gpu)
File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 90, in gpu_utilization
statuses = list_gpus.device_statuses()
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in device_statuses
return [device_status(device_index) for device_index in range(device_count)]
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in <listcomp>
return [device_status(device_index) for device_index in range(device_count)]
File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 19, in device_status
nv_procs = nv.nvmlDeviceGetComputeRunningProcesses(handle)
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2608, in nvmlDeviceGetComputeRunningProcesses
return nvmlDeviceGetComputeRunningProcesses_v3(handle);
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2576, in nvmlDeviceGetComputeRunningProcesses_v3
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v3")
File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
Seems like an NVIDIA driver issue - see this for example https://github.com/NVIDIA/k8s-device-plugin/issues/331
Try updating this line https://github.com/pytorch/serve/blob/master/docker/build_image.sh#L46 to nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 and then run the build_image.sh script
Seems like an NVIDIA driver issue - see this for example NVIDIA/k8s-device-plugin#331
Try updating this line https://github.com/pytorch/serve/blob/master/docker/build_image.sh#L46 to
nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04and then run the build_image.sh script
I updated build_image.sh and rebuilt the image (./build_image.sh -g). I'm able to initialize the models but the Metrics API isn't working(curl http://127.0.0.1:8082/metrics) and returns an empty response.
Thanks @dt-subaandh-krishnakumar I believe that sounds like a separate issue - tagging @namannandan who owns this
Might make sense to open a seperate issue for this though so we don't lose this
Fixed in this PR https://github.com/pytorch/serve/pull/2435
I faced this issue, you need to install CUDA 11.8 and corresponding torch version with CUDA 11.8.