TensorRT How can I adjust the position of quantization nodes to reduce data conversion?

I try to qat for yolo11-det, and I get QDQ network, but after trans to trt engine, I find there are lots of node trans data to fp16 from int8, and int8 to fp16.

I have try some test, I make '_input_quantizer' of 'QuantizeLinear' after 'Split' same as '_input_quantizer' of 'QuantizeLinear' after 'Concat', but it don't work. please help.

To simplify the problem, we only focus on the subgraphs of the 3rd to 6th convolutional layers as shown below.

part of problem graph is show below:

and trt graph is below:

May 27 '25 09:05 lzcchl

trt svg is:

May 27 '25 09:05 lzcchl

onnx graph is:

May 27 '25 09:05 lzcchl

Through observing the automatic and manual placement of QDQ nodes in YOLOv7, I roughly understand that the parameters within the red box below need to be set equal. However, this rule doesn’t seem to work in model yolov11 which have different layers. Moreover, after setting them equal, why aren’t they fully merged? Why are there two consecutive QDQ nodes on the right? It’s driving me crazy.

Jun 04 '25 08:06 lzcchl

hello, May I ask, how to visualize .trt engine?

Jul 11 '25 03:07 Vieeo

@Vieeo trt-engine-explorer

Jul 11 '25 08:07 lzcchl