serve Torchserve 0.8.1: ONNX GPU models not working

🐛 Describe the bug

I recently updated the torchserve version from 0.7.1-gpu to 0.8.1-gpu.

Current setup

I used torchserve:0.7.1-gpu from the source and build a docker image with torch2.0+cpu. The onnx GPU models and were running and the models used ~8.5 Memory and ~4GB GPU (cuda 11.7).

Bug

After a recent update to torchserve 0.8.1 the torch 2.0+cpu no longer worked and failed with the following error:

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I managed to fix this by using torch2.0 with gpu dependencies but doing so increased the Memory(~13GB ) and GPU (~6GB) consumption.

The models were not updated. I built torchserve0.8.1 with ./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38

Error logs

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Installation instructions

Yes I ran ./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38

FROM torchserve_py38

RUN pip install torch==2.0 (for cpu version --extra-index-url https://download.pytorch.org/whl/cpu)
RUN pip install onnxruntime-gpu==1.13.1
...
...

Model Packaing

I converted pytorch models to onnx and served them using Custom Handler.

config.properties

No response

Versions

Working

Torchserve branch: 

torchserve==0.7.1b20230208
torch-model-archiver==0.7.1b20230208

Python version: 3.8 (64-bit runtime)
Python executable: /usr/bin/python

Versions of relevant python libraries:
captum==0.6.0
intel-extension-for-pytorch==1.13.0
numpy==1.22.2
nvgpu==0.9.0
psutil==5.6.7
pygit2==1.11.1
pylint==2.6.0
pytest==7.2.1
pytest-cov==4.0.0
pytest-mock==3.10.0
requests==2.31.0
requests-toolbelt==0.10.1
sentencepiece==0.1.97
torch==2.0.0+cpu
torch-model-archiver==0.7.1b20230208
torch-workflow-archiver==0.2.7b20230208
torchaudio==0.13.1+cu117
torchserve==0.7.1b20230208
torchtext==0.14.1
torchvision==0.15.1+cpu
transformers==4.10.0
wheel==0.40.0
torch==2.0.0+cpu
torchtext==0.14.1
torchvision==0.15.1+cpu
torchaudio==0.13.1+cu117

Java Version:


OS: Ubuntu 20.04.5 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.26.4

Bug

-----------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.8 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.22.2
nvgpu==0.10.0
psutil==5.6.7
requests==2.31.0
sentencepiece==0.1.97
torch==2.0.0
torch-model-archiver==0.8.1
torch-workflow-archiver==0.2.9
torchaudio==2.0.2+cu117
torchdata==0.6.1
torchserve==0.8.1
torchtext==0.15.2+cpu
torchvision==0.15.1
transformers==4.10.0
wheel==0.40.0
torch==2.0.0
torchtext==0.15.2+cpu
torchvision==0.15.1
torchaudio==2.0.2+cu117

Java Version:


OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.26.4

Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration: 
GPU 0: Tesla T4
Nvidia driver version: 470.161.03
cuDNN version: None

Repro instructions

Issue 1

Convert a PyTorch model to onnx and run it with python3.8, onnxruntime-gpu==1.13.1, torchserve0.7.1-gpu and torch2.0.0+cpu. (Take note of GPU, Memory consumption)
Build a new torchserve image with this command ./build_image.sh -py 3.8 -cv cu117 -g -t torchserve_py38. Run the same model with (onnxruntime-gpu==1.13.1 and torch2.0.0+cpu).

Issue 2

Use torch2.0 instead of torch 2.0+cpu the memory and GPU consumption will increase.

Possible Solution

No response

Jun 23 '23 10:06 dt-subaandh-krishnakumar

So the logic for whether to use the ONNX CUDA environment is controlled here

https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L80-L84

Both on CPU and GPU our tests were passing so I don't suspect there's a bug with how this is set

Bug 1

What's confusing me about your setup is using both onnx gpu runtime and torch cpu?

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

It seems to me like what's happening is in your handler your map_location is cuda so make sure that's not not the case https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L112

Bug 2

Are you using the same dependencies for onnx and onnx runtime when measuring the extra memory overhead? torch 2.0 in general has way more dependencies but the overhead you're seeing is significant

I suspect you should be able to repro your errors without torchserve in the loop which will make debugging this a bit easier. Let me know if this all makes sense

Jun 23 '23 18:06 msaroufim

Hello @msaroufim ,

Following your comments, I simplified the requirements and the Docker file to make sure there's nothing wrong with my setup.

Bug 1

I don't understand why I should set the map_location=None when I want my model to use CUDA.

Summary: With the updated setup I still have the error. (Failed to create CUDAExecutionProvider)

My test setup is available here: https://github.com/dt-subaandh-krishnakumar/pytorch_issue. I attached the Logs, Dockerfiles, GPU info. This issue occurs in all onnx models (This is the one used for testing https://huggingface.co/docs/transformers/serialization).

Possible Solution

During my test I observed the following libraries are missing in torchserve 0.8.1 docker image.

nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91

Bug 2

I believe if the Bug 1 is fixed this won't be a problem as I don't need to install torch 2.0 which has lot of other dependencies.

Please let me know if you need any more information.

Jun 26 '23 16:06 dt-subaandh-krishnakumar

Possible Solution During my test I observed that in torchserve 0.8.1 the following libraries are missing in torchserve 0.8.1 docker image.

This is interesting, tagging @agunapal since a similar issue came up with deepspeed - was not aware ONNX depends on all of this. In that case you can check if your issue goes away if you build a new docker image with the nvidia runtime like so https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143

Jun 26 '23 18:06 msaroufim

Possible Solution During my test I observed that in torchserve 0.8.1 the following libraries are missing in torchserve 0.8.1 docker image.

This is interesting, tagging @agunapal since a similar issue came up with deepspeed - was not aware ONNX depends on all of this. In that case you can check if your issue goes away if you build a new docker image with the nvidia runtime like so https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143

There were some errors with this https://github.com/pytorch/serve/blob/master/docker/Dockerfile#L6C3-L6C143

In my case, this worked

docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 --build-arg PYTHON_VERSION=3.8 -t torchserve:0.8.1  .

but I have a new issue the metrics API returns nothing. Any idea why?

  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/nvidia/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ts/metrics/metric_collector.py", line 27, in <module>
    system_metrics.collect_all(sys.modules['ts.metrics.system_metrics'], arguments.gpu)
  File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 119, in collect_all
    value(num_of_gpu)
  File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 90, in gpu_utilization
    statuses = list_gpus.device_statuses()
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in device_statuses
    return [device_status(device_index) for device_index in range(device_count)]
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in <listcomp>
    return [device_status(device_index) for device_index in range(device_count)]
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 19, in device_status
    nv_procs = nv.nvmlDeviceGetComputeRunningProcesses(handle)
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2608, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v3(handle);
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2576, in nvmlDeviceGetComputeRunningProcesses_v3
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v3")
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

2023-06-27T14:49:12,149 [ERROR] Thread-1 org.pytorch.serve.metrics.MetricCollector - Traceback (most recent call last):
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/nvidia/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ts/metrics/metric_collector.py", line 27, in <module>
    system_metrics.collect_all(sys.modules['ts.metrics.system_metrics'], arguments.gpu)
  File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 119, in collect_all
    value(num_of_gpu)
  File "/home/venv/lib/python3.8/site-packages/ts/metrics/system_metrics.py", line 90, in gpu_utilization
    statuses = list_gpus.device_statuses()
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in device_statuses
    return [device_status(device_index) for device_index in range(device_count)]
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 75, in <listcomp>
    return [device_status(device_index) for device_index in range(device_count)]
  File "/home/venv/lib/python3.8/site-packages/nvgpu/list_gpus.py", line 19, in device_status
    nv_procs = nv.nvmlDeviceGetComputeRunningProcesses(handle)
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2608, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v3(handle);
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 2576, in nvmlDeviceGetComputeRunningProcesses_v3
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v3")
  File "/home/venv/lib/python3.8/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

Jun 27 '23 15:06 dt-subaandh-krishnakumar

Seems like an NVIDIA driver issue - see this for example https://github.com/NVIDIA/k8s-device-plugin/issues/331

Try updating this line https://github.com/pytorch/serve/blob/master/docker/build_image.sh#L46 to nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 and then run the build_image.sh script

Jun 27 '23 19:06 msaroufim

Seems like an NVIDIA driver issue - see this for example NVIDIA/k8s-device-plugin#331

Try updating this line https://github.com/pytorch/serve/blob/master/docker/build_image.sh#L46 to nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 and then run the build_image.sh script

I updated build_image.sh and rebuilt the image (./build_image.sh -g). I'm able to initialize the models but the Metrics API isn't working(curl http://127.0.0.1:8082/metrics) and returns an empty response.

Jun 27 '23 21:06 dt-subaandh-krishnakumar

Thanks @dt-subaandh-krishnakumar I believe that sounds like a separate issue - tagging @namannandan who owns this

Might make sense to open a seperate issue for this though so we don't lose this

Jun 28 '23 02:06 msaroufim

Fixed in this PR https://github.com/pytorch/serve/pull/2435

Jul 17 '23 17:07 dt-subaandh-krishnakumar

I faced this issue, you need to install CUDA 11.8 and corresponding torch version with CUDA 11.8.

Jan 09 '24 10:01 hungtrieu07