datinje

Results 12 comments of datinje

sorry finishing : so can you explain the reason of this element by element check ?

So I understand the reason of this check was to for the code to be more general. And Your improvement is going to improve perf by 10x for this specific...

problem fixed using a different docker file that uses lastes onnx and latest onnxruntime (1.14.1). apologies for the trouble

> onnx-trt parser [filters out](https://github.com/onnx/onnx-tensorrt/blob/main/ModelImporter.cpp#L377) `NonMaxSuppression`, `NonZero`, and `RoiAlign`, so that's why you saw those nodes are placed on CUDA/CPU EP. i also think that many memcpy between CPU/GPU causes...

If I want to test the performance I get by not filtering out these operators by commenting out the lines [https://github.com/onnx/onnx-tensorrt/blob/main/ModelImporter.cpp#L377](url), then where shall I modify the ModelImporter.cpp file before...

what if I compile onnxruntime with --use_tensorrt_builtin_parser : will teh nodes be filtered out ?

no change if I recompile onnxruntime with -use_tensorrt_builtin_parser The nodes are still placed on CPU

thx a lot @chilo-ms : I will try to integrate the 2 plugins in my model to test performance improvement. Hoping that ONNRT TRT EP to use TRT API enqueueV3...

after discussing with NVIDIA on how to integrate plugins , we found out that NMS and nonzero ARE implemented in tensorRT . cf - NMS: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_n_m_s_layer.html - NonZero: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/class_i_non_zero.html for...

in 1.16.0 there is this new session option disable_cpu_ep_fallback. How can we set it ? and will this prevent falling back nonZero and NMS on CPU EP ?