TensorRT after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32

Description

I tried to follow the int8 custom calibration to build my int8 engine from onnx fp32 model. https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt

After building the engine, I used the following to inspect the layer polygraphy inspect model int8.engine --model-type engine --show layers However, all the layers still use fp32

Moreover, I tried the debug precision function to investigate the layers' differences to build a mixed-precision engine, the result shows the same and the inference time gets much slower than the onnx fp32 model. CUDA_VISIBLE_DEVICES=3 polygraphy debug precision fp32_model.onnx --int8 --tactic-sources cublas --verbose -p float32 --calibration-cache int8_calib.cache --check polygraphy run polygraphy_debug.engine --trt --load-inputs golden_input.json --load-outputs golden.json --abs 1e-2

Environment

TensorRT Version: 10.4

NVIDIA GPU: A100

NVIDIA Driver Version: 12.5

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Jan 28 '25 09:01 jinhonglu

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

Feb 10 '25 22:02 kevinch-nv

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

The model is quite big, it is unable to upload here. any other way to share with you?

the output of the int8 engine is same as the fp32 onnx model, but I just wondering why the layers of int8 engine still show fp32 after using polygraphy to inspect

and the inference time gets triple higher than the fp32 onnx model

Feb 11 '25 01:02 jinhonglu

@jinhonglu sharing via a google drive link works as well

May 29 '25 21:05 poweiw

Issue has not received an update in over 14 days. Adding stale label. Please note the issue will be closed in 14 days after being marked stale if there is no update.

Jun 18 '25 23:06 github-actions[bot]

This issue was closed because it has been 14 days without activity since it has been marked as stale.

Jul 03 '25 00:07 github-actions[bot]