onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Got segmentation fault error when using 'InferenceSession' API

Open baoachun opened this issue 3 years ago • 5 comments

Describe the bug I'm using onnxruntime Python API to do inference, but there is segmentation fault error when using 'InferenceSession'. image

Urgency emergency

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
  • ONNX Runtime installed from (source or binary): pypi
  • ONNX Runtime version: 1.11.0
  • Python version: 3.8.6
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: N
  • GPU model and memory: N

To Reproduce

import onnx
import onnxruntime as ort
import torch
import torchvision

model = torchvision.models.alexnet()
model.eval()
input_names = ['input']
output_names = ['output']
x = torch.randn(1,3,224,224, requires_grad=False)
torch.onnx.export(model, x, 'alexnet.onnx', input_names=input_names, output_names=output_names, verbose='True', opset_version=12)

model_onnx = onnx.load('alexnet.onnx')
onnx.checker.check_model(model_onnx)
session = ort.InferenceSession('alexnet.onnx')

Expected behavior A clear and concise description of what you expected to happen.

Screenshots gdb message image

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

baoachun avatar Jun 23 '22 10:06 baoachun

CC @pranavsharma

faxu avatar Jul 08 '22 21:07 faxu

Any updates? I am experiencing the same problem on onnxruntime==1.12.0. When using onnxruntime==1.11.0 it just hangs as described here:

https://github.com/microsoft/onnxruntime/issues/10166

TTrapper avatar Jul 27 '22 16:07 TTrapper

I cannot repro the issue. I used the exact same python script you've pasted in this issue. I get no segfault.

(mypython3) [pranav@pranav-dev-centos79 ~]$ cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) (mypython3) [pranav@pranav-dev-centos79 ~]$ python -V Python 3.8.13 (mypython3) [pranav@pranav-dev-centos79 ~]$ pip list | grep onnx onnx 1.12.0 onnxruntime 1.12.0

pranavsharma avatar Jul 27 '22 22:07 pranavsharma

CentOS Linux release 7.6.1810 (Core) Python 3.8.1 onnx 1.12.0 onnxruntime 1.12.0

Here is the full output I am getting from the above script. No segfault here, but it does crash:

Exported graph: graph(%input : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
      %features.0.weight : Float(64, 3, 11, 11, strides=[363, 121, 11, 1], requires_grad=1, device=cpu),
      %features.0.bias : Float(64, strides=[1], requires_grad=1, device=cpu),
      %features.3.weight : Float(192, 64, 5, 5, strides=[1600, 25, 5, 1], requires_grad=1, device=cpu),
      %features.3.bias : Float(192, strides=[1], requires_grad=1, device=cpu),
      %features.6.weight : Float(384, 192, 3, 3, strides=[1728, 9, 3, 1], requires_grad=1, device=cpu),
      %features.6.bias : Float(384, strides=[1], requires_grad=1, device=cpu),
      %features.8.weight : Float(256, 384, 3, 3, strides=[3456, 9, 3, 1], requires_grad=1, device=cpu),
      %features.8.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %features.10.weight : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=1, device=cpu),
      %features.10.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %classifier.1.weight : Float(4096, 9216, strides=[9216, 1], requires_grad=1, device=cpu),
      %classifier.1.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.4.weight : Float(4096, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.4.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.6.weight : Float(1000, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.6.bias : Float(1000, strides=[1], requires_grad=1, device=cpu)):
  %input.1 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides
=[4, 4], onnx_name="Conv_0"](%input, %features.0.weight, %features.0.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_18 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_1"](%input.1) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.4 : Float(1, 64, 27, 27, strides=[46656, 729, 27, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_n
ame="MaxPool_2"](%onnx::MaxPool_18) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.8 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[2, 2, 2, 2], strides=[
1, 1], onnx_name="Conv_3"](%input.4, %features.3.weight, %features.3.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_21 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_4"](%input.8) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.12 : Float(1, 192, 13, 13, strides=[32448, 169, 13, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx
_name="MaxPool_5"](%onnx::MaxPool_21) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.16 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_6"](%input.12, %features.6.weight, %features.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.
py:453:0
  %onnx::Conv_24 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_7"](%input.16) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.20 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_8"](%onnx::Conv_24, %features.8.weight, %features.8.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/
conv.py:453:0
  %onnx::Conv_26 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_9"](%input.20) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.24 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_10"](%onnx::Conv_26, %features.10.weight, %features.10.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modul
es/conv.py:453:0
  %onnx::MaxPool_28 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_11"](%input.24) # /mnt/nlu/users/michael_traynor/onn
xbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.28 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_name
="MaxPool_12"](%onnx::MaxPool_28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %onnx::Flatten_30 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::AveragePool[kernel_shape=[1, 1], strides=[1, 1], onnx_name="AveragePool_13"](%
input.28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1214:0
  %input.32 : Float(1, 9216, strides=[9216, 1], requires_grad=1, device=cpu) = onnx::Flatten[axis=1, onnx_name="Flatten_14"](%onnx::Flatten_30) # /mnt/nlu/users/michael_traynor/onnxbu
g/venv_torch_onnx/lib/python3.8/site-packages/torchvision/models/alexnet.py:50:0
  %input.36 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_15"](%input.32, %classifier.1.weight, %classifie
r.1.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-

packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_33 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_16"](%input.36) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.40 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_17"](%onnx::Gemm_33, %classifier.4.weight, %clas
sifier.4.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_35 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_18"](%input.40) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %output : Float(1, 1000, strides=[1000, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_19"](%onnx::Gemm_35, %classifier.6.weight, %classi
fier.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  return (%output)

Traceback (most recent call last):
  File "example_github.py", line 15, in <module>
    session = ort.InferenceSession('alexnet.onnx')
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_sessio
n
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolI
nterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:

TTrapper avatar Jul 29 '22 16:07 TTrapper

Still encountering this issue and can recreate with different numpy versions (I just pin onnxruntime = "==1.16.3" for Cent OS 7 compatibility).

This produces seg fault:

numpy = "==2.0.0"
onnxruntime = "==1.16.3"

This does not:

numpy = "==1.26.4"
onnxruntime = "==1.16.3"

The relevant trace is:

Fatal Python error: Segmentation fault

Current thread 0x000078c3488bb000 (most recent call first):
File ".../python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220 in run

I trigger this error with the following:

session = onnxruntime.InferenceSession("models/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("models/")
inputs = tokenizer(
            texts,
            padding=True,
            truncation=True,
            return_attention_mask=True,
            return_token_type_ids=True,
            return_tensors="np",
        )
preds = session.run(None, dict(inputs))[0]

Sorry I don't have time to dig into this issue further for you.

Alexander-Mark avatar Jun 25 '24 06:06 Alexander-Mark

Applying stale label due to no activity in 30 days