deepsparse icon indicating copy to clipboard operation
deepsparse copied to clipboard

YOLOv5l model crashing on c5.12xlarge for batch size > 8

Open markurtz opened this issue 3 years ago • 0 comments

From community slack https://discuss-neuralmagic.slack.com/archives/C020FPF3MQX/p1657890578280219:

mt 8:09 AM Hello! I was using deepsparse on a checkpoint of a yolov5l model generated by --one-shot on a c5.12xlarge and got the following error for batch size >=8 2022-07-14 20:06:54 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) (system=avx512, binary=avx512) DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) Date: 07-14-2022 @ 20:06:59 UTC OS: Linux data-workstation 5.4.0-1072-aws #77~18.04.1-Ubuntu SMP Thu Apr 7 21:38:47 UTC 2022 Arch: x86_64 CPU: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz Vendor: GenuineIntel Cores/sockets/threads: [24, 1, 48] Available cores/sockets/threads: [24, 1, 48] L1 cache size data/instruction: 32k/32k L2 cache size: 1Mb L3 cache size: 35.75Mb Total memory: 92.2119G Free memory: 90.8325G

Assertion at src/lib/engine/execution/pyramidal/exec_graph_utils.cpp:240

Backtrace: 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 1# wand::detail::assert_fail(char const*, char const*, int) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 2# 0x00007FEC9F562EBA in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 3# 0x00007FEC9F565E3A in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 4# 0x00007FEC9F550117 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 5# 0x00007FEC9F4C6C01 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 6# 0x00007FEC9F4C8502 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 7# 0x00007FEC9F4C8563 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 8# 0x00007FECA0C4A040 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 9# 0x00007FEDCFE776DB in /lib/x86_64-linux-gnu/libpthread.so.0 10# clone in /lib/x86_64-linux-gnu/libc.so.6

Please email a copy of this stack trace and any additional information to: [email protected] DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized)deepsparse_testing.sh: line 2: 30994 Aborted deepsparse.benchmark -b $i checkpoints/logo_l_pruned_quant.onnx I tried it with the nightly version as well, but it did not work. The process was just killed.

markurtz avatar Jul 15 '22 16:07 markurtz