TensorRT Build engine file failed with INT8 calibration mode

Description

I export onnx model and build engine with FP16 mode successfully, but the issue occurs when building engine with INT8 calibration mode.

The error:

...
[06/30/2022-17:48:48] [TRT] [I] Total Activation Memory: 37483520
[06/30/2022-17:48:55] [TRT] [V] Using cublasLt as a tactic source
[06/30/2022-17:48:55] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2596, GPU 2320 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Using cuDNN as a tactic source
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2596, GPU 2328 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Engine generation completed in 33.4191 seconds.
[06/30/2022-17:48:56] [TRT] [V] Using cublasLt as a tactic source
[06/30/2022-17:48:56] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2595, GPU 2304 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Using cuDNN as a tactic source
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2595, GPU 2312 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Total per-runner device persistent memory is 0
[06/30/2022-17:48:56] [TRT] [V] Total per-runner host persistent memory is 17296
[06/30/2022-17:48:56] [TRT] [V] Allocated activation device memory of size 37483520
[06/30/2022-17:48:56] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +35, now: CPU 0, GPU 199 (MiB)
[06/30/2022-17:48:56] [TRT] [V] Calculating Maxima
[06/30/2022-17:48:56] [TRT] [I] Starting Calibration.
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 2: [executionContext.cpp::commonEmitDebugTensor::1202] Error Code 2: Internal Error (Assertion regionDims.nbDims >= 0 && regionDims.nbDims <= (Dims::MAX_DIMS - 1) failed. invalid regionDims dimensions)
[06/30/2022-17:48:56] [TRT] [E] 1: [calibrator.cu::absTensorMax::135] Error Code 1: Cuda Runtime (invalid configuration argument)

Code of build_engine

def build_engine(
    runtime: Runtime,
    onnx_file_path: str,
    logger: Logger,
    min_shape: Tuple[int, int],
    optimal_shape: Tuple[int, int],
    max_shape: Tuple[int, int],
    workspace_size: int,
    fp16: bool,
    int8: bool,
    calibrator: trt.IInt8Calibrator = None
) -> ICudaEngine:
    """
    Convert ONNX model to TensorRT engine.
    
    :param runtime: TensorRT runtime used for inference calls / model building
    :param onnx_file_path: path to the ONNX model
    :param logger: specific logger to TensorRT
    :param min_shape: minimal shape of input tensors
    :param optimal_shape: optimal shape of input tensors
    :param max_shape: maximal shape of input tensors
    :param workspace_size: GPU memory to use during model building
    :param fp16: enable FP16 precision
    :param int8: enable INT-8 quantization
    :param calibrator: calibrator to use for INT-8 quantization
    :return: TensorRT engine to run inference
    """
    with trt.Builder(logger) as builder:
        with builder.create_network(
            # Explicit batch mode: all dimensions are explicit and can be dynamic
            flags=1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
        ) as network_definition:
            with trt.OnnxParser(network_definition, logger) as parser:
                builder.max_batch_size = max_shape[0]
                config: IBuilderConfig = builder.create_builder_config()
                config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
                config.max_workspace_size = workspace_size
                # enable CUDNN, CUBLAS and CUBLAS_LT
                
                config.set_tactic_sources(
                    tactic_sources=1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT) | 1 << int(trt.TacticSource.CUDNN)
                )
                
                if fp16:
                    config.set_flag(trt.BuilderFlag.FP16)
                if int8:
                    assert calibrator is not None, "Calibration is required for int8 quantization"
                    config.set_flag(trt.BuilderFlag.INT8)
                    config.int8_calibrator = calibrator
                #config.set_flag(trt.BuilderFlag.DISABLE_TIMING_CACHE)
                
                # Ask the builder to prefer the type constraints/hints when choosing layer implementations
                # instead of choosing the fastest
                #config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
                
                print('Parsing onnx model ...')
                with open(onnx_file_path, "rb") as f:
                    if not parser.parse(f.read()):
                        print(parser.get_error(0))
                print('Parsing onnx model finished')
                
                # The builder selects the kernels that result in the lowest runtime for the optimum
                # input tensor dimensions, and are valid for all input tensor sizes in the valid range
                # between minimum and maximum dimensions
                # At least one optimization profile is required with dynamically resizable inputs
                profile: IOptimizationProfile = builder.create_optimization_profile()
                for num_input in range(network_definition.num_inputs):
                    profile.set_shape(
                        input=network_definition.get_input(num_input).name,
                        min=min_shape,
                        opt=optimal_shape,
                        max=max_shape,
                    )
                config.add_optimization_profile(profile)

                if fp16:
                    # Noticeable differences have been observed when converting some layers in FP16
                    # Force those layers in FP32
                    network_definition = _fix_fp16_network(network_definition)
                print('[Start build_serialized_network...]')
                trt_engine = builder.build_serialized_network(network_definition, config)
                engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
                return engine

Code of int8-calibrator:

Note that self._batch_stream feed the calibration samples.

class Int8Calibrator(trt.IInt8EntropyCalibrator2):
    def __init__(
            self,
            batch_stream,
            input_shape: tuple = (1,3,8,112,112),
            calib_cache: str = None
        ):

        trt.IInt8EntropyCalibrator2.__init__(self)

        self._batch_stream = batch_stream
        self._input_shape = input_shape
        self._calib_cache = calib_cache

        # Allocate gpu memory
        self._device_input = cuda.mem_alloc(trt.volume(self._input_shape) * trt.float32.itemsize)

    
    def get_batch_size(self) -> int:
        """Get batch size.
        """
        self._batch_size = self._input_shape[0]
        return self._batch_size

    def get_batch(self, names):    # pylint: disable=unused-argument
        """Get a batch of input for calibration.
        Args:
            names (List[str]): list of file names
        Returns:
            list of device memory pointers set to the memory containing
            each network input data, or an empty list if there are no more
            batches for calibration
        """
        print('Calibration starting...')
        try:
            data = self._batch_stream.next_batch()
            cuda.memcpy_htod(self._device_input, data)
            print('memcpy_htod finished...')
            return [int(self._device_input)]
        except StopIteration:
            # When we're out of batches, we return either [] or None.
            # This signals to TensorRT that there is no calibration data remaining.
            return None

    def read_calibration_cache(self):
        """Load a calibration cache.
        Returns:
            a cache object or None if there is no data
        """
        # If there is a cache, use it instead of calibrating again. Otherwise,
        # return None.
        if os.path.exists(self._calib_cache):
            with open(self._calib_cache, "rb") as calib_cache_file:
                return calib_cache_file.read()
        return None

    def write_calibration_cache(self, cache: memoryview):
        """Save a calibration cache.
        Args:
            cache (memoryview): the calibration cache to write
        """
        print('[Write calibration cache file]')
        with open(self._calib_cache, "wb") as calib_cache_file:
            calib_cache_file.write(cache)

Environment

TensorRT Version: 8.2.1.8 NVIDIA GPU: A100 NVIDIA Driver Version: 470.103.01 CUDA Version: 11.1 CUDNN Version: 8.4.0 Operating System: Linux version 5.4.0-81-generic (buildd@lgw01-amd64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) Python Version (if applicable): 3.7.13 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.8.1+cu111 Baremetal or Container (if so, version):

Jun 30 '22 10:06 MaxeeCR

@MaxeeCR Could you check if you have set the calibration profile correctly? See: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes

Jun 30 '22 10:06 nvpohanh

@MaxeeCR Could you check if you have set the calibration profile correctly? See: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes

I add config.add_optimization_profile(profile) in build_engine func, but the same error remains

Jul 01 '22 02:07 MaxeeCR

@MaxeeCR Could you provide your ONNX file so that we can repro and debug this issue? Thanks

Jul 01 '22 06:07 nvpohanh

@MaxeeCR Could you provide your ONNX file so that we can repro and debug this issue? Thanks

onnx model: https://drive.google.com/file/d/1EpWYDYZQ4Pooa-sfCkxkA-a3FG5FzRIb/view?usp=sharing

Jul 04 '22 02:07 MaxeeCR

@nvpohanh I got the same error when I build the engine with INT8, static shape is used as the input.

Aug 05 '22 08:08 cxiang26

I've also encoutered this situation. The common part is that the input that we used has a rank of 5. I wonder that whether tensorRT support video input when building engine with INT8 calibration mode?

Nov 30 '23 08:11 isotopezzq