datinje comments

Results 12 comments of


                                            datinje

Poco Multipart parsing is 10x slower than its Boost/beat or restinio equivalent

sorry finishing : so can you explain the reason of this element by element check ?

Poco Multipart parsing is 10x slower than its Boost/beat or restinio equivalent

So I understand the reason of this check was to for the code to be more general. And Your improvement is going to improve perf by 10x for this specific...

[BUG]

problem fixed using a different docker file that uses lastes onnx and latest onnxruntime (1.14.1). apologies for the trouble

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

> onnx-trt parser [filters out](https://github.com/onnx/onnx-tensorrt/blob/main/ModelImporter.cpp#L377) `NonMaxSuppression`, `NonZero`, and `RoiAlign`, so that's why you saw those nodes are placed on CUDA/CPU EP. i also think that many memcpy between CPU/GPU causes...

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

If I want to test the performance I get by not filtering out these operators by commenting out the lines [https://github.com/onnx/onnx-tensorrt/blob/main/ModelImporter.cpp#L377](url), then where shall I modify the ModelImporter.cpp file before...

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

what if I compile onnxruntime with --use_tensorrt_builtin_parser : will teh nodes be filtered out ?

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

no change if I recompile onnxruntime with -use_tensorrt_builtin_parser The nodes are still placed on CPU

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

thx a lot @chilo-ms : I will try to integrate the 2 plugins in my model to test performance improvement. Hoping that ONNRT TRT EP to use TRT API enqueueV3...

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

after discussing with NVIDIA on how to integrate plugins , we found out that NMS and nonzero ARE implemented in tensorRT . cf - NMS: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_n_m_s_layer.html - NonZero: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/class_i_non_zero.html for...

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance]

in 1.16.0 there is this new session option disable_cpu_ep_fallback. How can we set it ? and will this prevent falling back nonZero and NMS on CPU EP ?