int8 calibration for batch > 1
Hi @rmccorm4, I would like to ask some advice on int8 calibration. I've had no trouble building explicit batch engines where the batch > 1 with fp16 and I've managed to get int8 explicit batch engines built where the batch = 1. However, int8 calibration seems to not work for batch > 1. It calibrates without errors or failures, and my demo app runs without errors so it's getting hard to debug. Do you have any advice? I've tried building the cache with batch = 1 and then using that to build an engine of batch > 1, and it seemed to work but I haven't been able to replicate that particular result.
Hi @maidmentdaniel ,
The int8 code in this repo is pretty outdated - I would encourage you to refer to Polygraphy's INT8 sample. The API is very intuitive to use: https://github.com/NVIDIA/TensorRT/blob/master/tools/Polygraphy/examples/api/04_int8_calibration_in_tensorrt/example.py
Thanks a lot for the pointer. I'm not quite sure I'm ready to update TensorRT and Cuda yet. Could you point to some info as to how the old API handles dynamic vs static onnx files that would be fantastic.
What version are you currently running? Also, the tensorrt NGC containers are a good way of trying different versions without always having to update host dependencies: https://ngc.nvidia.com/containers/nvidia:tensorrt