TensorRT
TensorRT copied to clipboard
ONNX conversion with lower performance engine
I am using following codes to conversion ONNX model:
trtexec --verbose --onnx=./resnet50-v1-12.onnx --fp16 \
--minShapes=data:1x3x224x224 \
--optShapes=data:32x3x224x224 \
--maxShapes=data:1024x3x224x224 \
--saveEngine=./resnet50_fp16-verbose.trt --best
in docker image nvcr.io/nvidia/tensorrt:25.08-py3
on H200
[09/09/2025-11:19:31] [V] [TRT] Engine Layer Information:
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to resnetv17_conv0_fwd + resnetv17_batchnorm0_fwd + resnetv17_relu0_fwd + resnetv17_pool0_fwd, Tactic: 0x00000000000003ea, data (Float[-1,3,224,224]) -> Reformatted Input Tensor 0 to resnetv17_conv0_fwd + resnetv17_batchnorm0_fwd + resnetv17_relu0_fwd + resnetv17_pool0_fwd (Int8[-1,3,224,224])
Layer(CaskConvActPool): resnetv17_conv0_fwd + resnetv17_batchnorm0_fwd + resnetv17_relu0_fwd + resnetv17_pool0_fwd, Tactic: 0x6ec9d99750749211, Reformatted Input Tensor 0 to resnetv17_conv0_fwd + resnetv17_batchnorm0_fwd + resnetv17_relu0_fwd + resnetv17_pool0_fwd (Int8[-1,3,224,224]) -> resnetv17_pool0_fwd (Int8[-1,64:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv0_fwd + resnetv17_stage1_batchnorm0_fwd + resnetv17_stage1_relu0_fwd, Tactic: 0x2640501019a61dc2, resnetv17_pool0_fwd (Int8[-1,64:32,56,56]) -> resnetv17_stage1_relu0_fwd (Int8[-1,64:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv1_fwd + resnetv17_stage1_batchnorm1_fwd + resnetv17_stage1_relu1_fwd, Tactic: 0x9dafb2758560cc1d, resnetv17_stage1_relu0_fwd (Int8[-1,64:32,56,56]) -> resnetv17_stage1_relu1_fwd (Int8[-1,64:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv2_fwd + resnetv17_stage1_batchnorm2_fwd, Tactic: 0xa433705d1adff009, resnetv17_stage1_relu1_fwd (Int8[-1,64:32,56,56]) -> resnetv17_stage1_batchnorm2_fwd (Int8[-1,256:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv3_fwd + resnetv17_stage1_batchnorm3_fwd + resnetv17_stage1__plus0 + resnetv17_stage1_activation0, Tactic: 0x3ffcb62b1c6bb94f, resnetv17_pool0_fwd (Int8[-1,64:32,56,56]), resnetv17_stage1_batchnorm2_fwd (Int8[-1,256:32,56,56]) -> resnetv17_stage1_activation0 (Int8[-1,256:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv4_fwd + resnetv17_stage1_batchnorm4_fwd + resnetv17_stage1_relu2_fwd, Tactic: 0x41bdb46e3c2617e5, resnetv17_stage1_activation0 (Int8[-1,256:32,56,56]) -> resnetv17_stage1_relu2_fwd (Int8[-1,64:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv5_fwd + resnetv17_stage1_batchnorm5_fwd + resnetv17_stage1_relu3_fwd, Tactic: 0x9dafb2758560cc1d, resnetv17_stage1_relu2_fwd (Int8[-1,64:32,56,56]) -> resnetv17_stage1_relu3_fwd (Int8[-1,64:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv6_fwd + resnetv17_stage1_batchnorm6_fwd + resnetv17_stage1__plus1 + resnetv17_stage1_activation1, Tactic: 0x3ffcb62b1c6bb94f, resnetv17_stage1_relu3_fwd (Int8[-1,64:32,56,56]), resnetv17_stage1_activation0 (Int8[-1,256:32,56,56]) -> resnetv17_stage1_activation1 (Int8[-1,256:32,56,56])
Layer(CaskConvolution): resnetv17_stage1_conv7_fwd + resnetv17_stage1_batchnorm7_fwd + resnetv17_stage1_relu4_fwd, Tactic: 0x41bdb46e3c2617e5, resnetv17_stage1_activation1 (Int8[-1,256:32,56,56]) -> resnetv17_stage1_relu4_fwd (Int8[-1,64:32,56,56])
...
on B200
[09/09/2025-11:28:18] [V] [TRT] Engine Layer Information:
Layer(Myelin): {ForeignNode[Quantize 0...resnetv17_dense0_fwd_castOut]}, Tactic: 0x0000000000000000, data (Float[-1,3,224,224]) -> resnetv17_dense0_fwd (Float[-1,1000])
it seems that the conversion doing less optimization on B200. How to fix ?
Different gpu-arch, get different tactics of trt compile.