after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32
Description
I tried to follow the int8 custom calibration to build my int8 engine from onnx fp32 model. https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt
After building the engine, I used the following to inspect the layer
polygraphy inspect model int8.engine --model-type engine --show layers
However, all the layers still use fp32
Moreover, I tried the debug precision function to investigate the layers' differences to build a mixed-precision engine, the result shows the same and the inference time gets much slower than the onnx fp32 model.
CUDA_VISIBLE_DEVICES=3 polygraphy debug precision fp32_model.onnx --int8 --tactic-sources cublas --verbose -p float32 --calibration-cache int8_calib.cache --check polygraphy run polygraphy_debug.engine --trt --load-inputs golden_input.json --load-outputs golden.json --abs 1e-2
Environment
TensorRT Version: 10.4
NVIDIA GPU: A100
NVIDIA Driver Version: 12.5
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.
Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.
The model is quite big, it is unable to upload here. any other way to share with you?
the output of the int8 engine is same as the fp32 onnx model, but I just wondering why the layers of int8 engine still show fp32 after using polygraphy to inspect
and the inference time gets triple higher than the fp32 onnx model
@jinhonglu sharing via a google drive link works as well
Issue has not received an update in over 14 days. Adding stale label. Please note the issue will be closed in 14 days after being marked stale if there is no update.
This issue was closed because it has been 14 days without activity since it has been marked as stale.