Build engine file failed with INT8 calibration mode
Description
I export onnx model and build engine with FP16 mode successfully, but the issue occurs when building engine with INT8 calibration mode.
The error:
...
[06/30/2022-17:48:48] [TRT] [I] Total Activation Memory: 37483520
[06/30/2022-17:48:55] [TRT] [V] Using cublasLt as a tactic source
[06/30/2022-17:48:55] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2596, GPU 2320 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Using cuDNN as a tactic source
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2596, GPU 2328 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Engine generation completed in 33.4191 seconds.
[06/30/2022-17:48:56] [TRT] [V] Using cublasLt as a tactic source
[06/30/2022-17:48:56] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2595, GPU 2304 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Using cuDNN as a tactic source
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2595, GPU 2312 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Total per-runner device persistent memory is 0
[06/30/2022-17:48:56] [TRT] [V] Total per-runner host persistent memory is 17296
[06/30/2022-17:48:56] [TRT] [V] Allocated activation device memory of size 37483520
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +35, now: CPU 0, GPU 199 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Calculating Maxima
[06/30/2022-17:48:56] [TRT] [I] Starting Calibration.
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 1: [calibrator.cu::absTensorMax::135] Error Code 1: Cuda Runtime (invalid configuration argument)
Code of build_engine
def build_engine(
runtime: Runtime,
onnx_file_path: str,
logger: Logger,
min_shape: Tuple[int, int],
optimal_shape: Tuple[int, int],
max_shape: Tuple[int, int],
workspace_size: int,
fp16: bool,
int8: bool,
calibrator: trt.IInt8Calibrator = None
) -> ICudaEngine:
"""
Convert ONNX model to TensorRT engine.
:param runtime: TensorRT runtime used for inference calls / model building
:param onnx_file_path: path to the ONNX model
:param logger: specific logger to TensorRT
:param min_shape: minimal shape of input tensors
:param optimal_shape: optimal shape of input tensors
:param max_shape: maximal shape of input tensors
:param workspace_size: GPU memory to use during model building
:param fp16: enable FP16 precision
:param int8: enable INT-8 quantization
:param calibrator: calibrator to use for INT-8 quantization
:return: TensorRT engine to run inference
"""
with trt.Builder(logger) as builder:
with builder.create_network(
# Explicit batch mode: all dimensions are explicit and can be dynamic
flags=1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
) as network_definition:
with trt.OnnxParser(network_definition, logger) as parser:
builder.max_batch_size = max_shape[0]
config: IBuilderConfig = builder.create_builder_config()
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
config.max_workspace_size = workspace_size
# enable CUDNN, CUBLAS and CUBLAS_LT
config.set_tactic_sources(
tactic_sources=1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT) | 1 << int(trt.TacticSource.CUDNN)
)
if fp16:
config.set_flag(trt.BuilderFlag.FP16)
if int8:
assert calibrator is not None, "Calibration is required for int8 quantization"
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = calibrator
#config.set_flag(trt.BuilderFlag.DISABLE_TIMING_CACHE)
# Ask the builder to prefer the type constraints/hints when choosing layer implementations
# instead of choosing the fastest
#config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
print('Parsing onnx model ...')
with open(onnx_file_path, "rb") as f:
if not parser.parse(f.read()):
print(parser.get_error(0))
print('Parsing onnx model finished')
# The builder selects the kernels that result in the lowest runtime for the optimum
# input tensor dimensions, and are valid for all input tensor sizes in the valid range
# between minimum and maximum dimensions
# At least one optimization profile is required with dynamically resizable inputs
profile: IOptimizationProfile = builder.create_optimization_profile()
for num_input in range(network_definition.num_inputs):
profile.set_shape(
input=network_definition.get_input(num_input).name,
min=min_shape,
opt=optimal_shape,
max=max_shape,
)
config.add_optimization_profile(profile)
if fp16:
# Noticeable differences have been observed when converting some layers in FP16
# Force those layers in FP32
network_definition = _fix_fp16_network(network_definition)
print('[Start build_serialized_network...]')
trt_engine = builder.build_serialized_network(network_definition, config)
engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
return engine
Code of int8-calibrator:
Note that self._batch_stream feed the calibration samples.
class Int8Calibrator(trt.IInt8EntropyCalibrator2):
def __init__(
self,
batch_stream,
input_shape: tuple = (1,3,8,112,112),
calib_cache: str = None
):
trt.IInt8EntropyCalibrator2.__init__(self)
self._batch_stream = batch_stream
self._input_shape = input_shape
self._calib_cache = calib_cache
# Allocate gpu memory
self._device_input = cuda.mem_alloc(trt.volume(self._input_shape) * trt.float32.itemsize)
def get_batch_size(self) -> int:
"""Get batch size.
"""
self._batch_size = self._input_shape[0]
return self._batch_size
def get_batch(self, names): # pylint: disable=unused-argument
"""Get a batch of input for calibration.
Args:
names (List[str]): list of file names
Returns:
list of device memory pointers set to the memory containing
each network input data, or an empty list if there are no more
batches for calibration
"""
print('Calibration starting...')
try:
data = self._batch_stream.next_batch()
cuda.memcpy_htod(self._device_input, data)
print('memcpy_htod finished...')
return [int(self._device_input)]
except StopIteration:
# When we're out of batches, we return either [] or None.
# This signals to TensorRT that there is no calibration data remaining.
return None
def read_calibration_cache(self):
"""Load a calibration cache.
Returns:
a cache object or None if there is no data
"""
# If there is a cache, use it instead of calibrating again. Otherwise,
# return None.
if os.path.exists(self._calib_cache):
with open(self._calib_cache, "rb") as calib_cache_file:
return calib_cache_file.read()
return None
def write_calibration_cache(self, cache: memoryview):
"""Save a calibration cache.
Args:
cache (memoryview): the calibration cache to write
"""
print('[Write calibration cache file]')
with open(self._calib_cache, "wb") as calib_cache_file:
calib_cache_file.write(cache)
Environment
TensorRT Version: 8.2.1.8 NVIDIA GPU: A100 NVIDIA Driver Version: 470.103.01 CUDA Version: 11.1 CUDNN Version: 8.4.0 Operating System: Linux version 5.4.0-81-generic (buildd@lgw01-amd64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) Python Version (if applicable): 3.7.13 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.8.1+cu111 Baremetal or Container (if so, version):
@MaxeeCR Could you check if you have set the calibration profile correctly? See: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes
@MaxeeCR Could you check if you have set the calibration profile correctly? See: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes
I add config.add_optimization_profile(profile) in build_engine func, but the same error remains
@MaxeeCR Could you provide your ONNX file so that we can repro and debug this issue? Thanks
@MaxeeCR Could you provide your ONNX file so that we can repro and debug this issue? Thanks
onnx model: https://drive.google.com/file/d/1EpWYDYZQ4Pooa-sfCkxkA-a3FG5FzRIb/view?usp=sharing
@nvpohanh I got the same error when I build the engine with INT8, static shape is used as the input.

I've also encoutered this situation.
The common part is that the input that we used has a rank of 5.
I wonder that whether tensorRT support video input when building engine with INT8 calibration mode?