Operator spec of Split operator's attribute split is wrong
Description
My model has a split operator that is expected to divide each sub-tensor according to the dimensions specified.
As depicted in this setting:
This is the split operator link.
But each sub-tensor was divided equally using tensorrt, that is to say the split setting didn't take effect.
The error information is as follows:
[shapeContext.cpp::operator()::4017] Error Code 4: Shape Error (reshape wildcard -1 has no integer solution. Reshaping [1,18720,8,32] to [1,4990,-1].)
4990 is 24953 divided by 5.
Environment
TensorRT Version: 10.0
NVIDIA GPU: GeForce RTX 3090
NVIDIA Driver Version: 353
CUDA Version: 12.2
CUDNN Version:
Operating System: ubuntu22
Python Version (if applicable): 3.9.7
PyTorch Version (if applicable): 2.2
Baremetal or Container (if so, version): nvcr.io/nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04
1*18720*8*32 / 4990 = 960.38477
has no integer solution.
You onnx export wrong, some dynamic ops , but you use const shape/val. Check Warning: and TracerWarning: in export process.
Take note that axis=1. The expected operation is to divide 24953 of [1, 24953, 8, 32] by [18720, 4680, 1170, 299, 84], and I can get five sub-tensor.
But each sub-tensor was divided equally using tensorrt, that is to say the split setting didn't take effect.
in other words:
24953 / 5 = 4990.6
Above answer explains. But, I'll probably reiterate.
From what OP shared, split seems to be doing its job - splitting a tensor of shape [1,24953,8,32] along axis=1 (presumably) into 5 tensors. The first among them (say output1) will be of shape [1,18720,8,32]. Problem seems to coming from a reshape of output1 to shape [1,4990,-1] - which is not possible. OP needs to figure where that reshape is coming from.
A good way to narrow down if the issue is in TRT itself is by running your onnx model in polygraphy using another backend (like onnxrt).
polygraphy run <your_model>.onnx --onnxrt
If your model has trouble running with onnxrt also, then there's a very good chance the problem is with the model itself. If not, please post here and consider sharing your model.
Hi @brb-nv , I use this command and the result is fine.
polygraphy run model.onnx --onnxrt
But I replace trt with onnxrt, It's starting to report errors.
I share this model using google dirve.
Thank you for verifying. Kindly provide access to the onnx file.
Hi @brb-nv , I solved this issue by using polygraphy surgeon sanitize --fold-constants, And there is still a error:
[E] 2: [myelinBuilderUtils.cpp::getMyelinSupportType::1270] Error Code 2: Internal Error (ForeignNode does not support data-dependent shape for now.)
[!] Invalid Engine. Please ensure the engine was built correctly
[E] FAILED | Runtime: 16.287s | Command: /home/osmagic/anaconda3/envs/pytorch/bin/polygraphy run --trt weights/out_nms_0524_sim_san.onnx
Please help me to see if there is any solution, and do I need to create a new issue?
I changed the access to the model, The model link is accessible.