easy_ViTPose failed fp16 inference.

hi, I have modifed your export function for fp16 infernece. I run into the following issue. Here is the modfied export code

    pmodel.load_state_dict(ckpt)
    pmodel.eval()
    pmodel.cuda()
    pmodel.half()

    C, H, W = (3, 256, 192)

    # model_wrapper = PoseModelWrapper(backbone=pose_model.backbone, head=pose_model.keypoint_head)

    trt_ts_module = torch_tensorrt.compile(pmodel,
                                           # If the inputs to the module are plain Tensors, specify them via the `inputs` argument:
                                           inputs=[
                                               torch_tensorrt.Input(  # Specify input object with shape and dtype
                                                   shape=[1, C, H, W],
                                                   dtype=torch.half
                                               )
                                           ],
                                           # TODO: ADD Datatype for inference. Allowed options torch.(float|half|int8|int32|bool)
                                           enabled_precisions= {torch.half},  # half Run with FP16
                                           workspace_size=1 << 32
                                           )
    torch.jit.save(trt_ts_module, engine_file_path)  # save the TRT embedded Torchscript

The inference input to the model is I am sure in cuda, half format.

            img_crop = torch.from_numpy(img_crop).cuda().half()

And here is the detailed error message.

Inference 2D pose:   0%|          | 1/1196 [00:00<02:02,  9.74it/s]Traceback (most recent call last):
  File "/home/khanh/mvai/code/sdc/mocap-mdc-tracking/mocap_mdc_tracking/detect/pose_trt.py", line 123, in <module>
    test_inference(vit_jit_model, smt_frms, vid_path, out_debug_dir=out_debug_dir)
  File "/home/khanh/mvai/code/sdc/mocap-mdc-tracking/mocap_mdc_tracking/detect/pose_trt.py", line 87, in test_inference
    heatmaps = vit_jit_model(img_crop).detach().cpu().numpy()
  File "/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/mocap_mdc_tracking/detect/vit_models/model.py", line 11, in forward
    _0 = ops.tensorrt.execute_engine([input_1], __torch___mocap_mdc_tracking_detect_vit_models_model_ViTPose_trt_engine_0x596979e5cbd0)
    _1, = _0
    input = torch.conv_transpose2d(_1, CONSTANTS.c0, None, [2, 2], [1, 1])
            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    input0 = torch.batch_norm(input, CONSTANTS.c1, CONSTANTS.c2, CONSTANTS.c3, CONSTANTS.c4, False, 0.10000000000000001, 1.0000000000000001e-05, True)
    input1 = torch.conv_transpose2d(torch.relu(input0), CONSTANTS.c5, None, [2, 2], [1, 1])

Traceback of TorchScript, original code (most recent call last):
  File "/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 956, in forward
            num_spatial_dims, self.dilation)  # type: ignore[arg-type]
    
        return F.conv_transpose2d(
               ~~~~~~~~~~~~~~~~~~ <--- HERE
            input, self.weight, self.bias, self.stride, self.padding,
            output_padding, self.groups, self.dilation)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Do have any idea about this issue? Just wonder if you have tested the export with fp16?

Apr 10 '24 16:04 khanhha

Hello! I did not test with half precision as my GPU is a bit old and does not benefit much from it.

Adding the require_full_compilation=True during trt export seems to work and I am able to run inference in half precision (using same modifications you did to the input) Note that from docs it seems that the flag will force dynamo backend to be used.

trt_ts_module = torch_tensorrt.compile(model,
    # If the inputs to the module are plain Tensors, specify them via the `inputs` argument:
    inputs = [
        torch_tensorrt.Input( # Specify input object with shape and dtype
            shape=[1, C, H, W],
            dtype=torch.half
        )
    ],
    enabled_precisions = {torch.half}, # half Run with FP16
    workspace_size = 1 << 28,
    require_full_compilation=True
)

Still I get a few warnings during export:

WARNING: [Torch-TensorRT TorchScript Conversion Context] - Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[...]

Check the results or try to use more recent opset during export (I did not find a way to specify it). Note that I did not .half() the model weights as you did, maybe it does changes the export results.

If you manage to have working half precision and see benefits from it don't esitate to fire a PR to adapt inference / exporting, I would appreciate! Lmk how it goes, good luck

Apr 10 '24 17:04 JunkyByte

thanks for your answer. I tried to use the flag require_full_compilation but it fails "/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:33: UserWarning: Fail to import MultiScaleDeformableAttention from mmcv.ops.multi_scale_deform_attn, You should install mmcv-full if you need this module. warnings.warn('Fail to import MultiScaleDeformableAttention from ' /home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch_tensorrt/fx/converters/acc_ops_converters.py:3376: SyntaxWarning: "is not" with a literal. Did you mean "!="? if approximate is not "none": ERROR: [Torch-TensorRT] - Method requested cannot be compiled end to end by Torch-TensorRT.TorchScript. Unsupported operators listed below:

aten::conv_transpose2d.input(Tensor input, Tensor weight, Tensor? bias=None, int[2] stride=1, int[2] padding=0, int[2] output_padding=0, int groups=1, int[2] dilation=1) -> Tensor You can either implement converters for these ops in your application or request implementation"

Apr 11 '24 13:04 khanhha

I am using torch_tensorrt version 1.3.0

Apr 11 '24 13:04 khanhha

Did you try to install mmcv-full as suggested? I can check the version I am using. I am passing the original ckpt not the TorchScript compiled version to torchtrt compile function tho.

Apr 11 '24 14:04 JunkyByte