failed fp16 inference.
hi, I have modifed your export function for fp16 infernece. I run into the following issue. Here is the modfied export code
pmodel.load_state_dict(ckpt)
pmodel.eval()
pmodel.cuda()
pmodel.half()
C, H, W = (3, 256, 192)
# model_wrapper = PoseModelWrapper(backbone=pose_model.backbone, head=pose_model.keypoint_head)
trt_ts_module = torch_tensorrt.compile(pmodel,
# If the inputs to the module are plain Tensors, specify them via the `inputs` argument:
inputs=[
torch_tensorrt.Input( # Specify input object with shape and dtype
shape=[1, C, H, W],
dtype=torch.half
)
],
# TODO: ADD Datatype for inference. Allowed options torch.(float|half|int8|int32|bool)
enabled_precisions= {torch.half}, # half Run with FP16
workspace_size=1 << 32
)
torch.jit.save(trt_ts_module, engine_file_path) # save the TRT embedded Torchscript
The inference input to the model is I am sure in cuda, half format.
img_crop = torch.from_numpy(img_crop).cuda().half()
And here is the detailed error message.
Inference 2D pose: 0%| | 1/1196 [00:00<02:02, 9.74it/s]Traceback (most recent call last):
File "/home/khanh/mvai/code/sdc/mocap-mdc-tracking/mocap_mdc_tracking/detect/pose_trt.py", line 123, in <module>
test_inference(vit_jit_model, smt_frms, vid_path, out_debug_dir=out_debug_dir)
File "/home/khanh/mvai/code/sdc/mocap-mdc-tracking/mocap_mdc_tracking/detect/pose_trt.py", line 87, in test_inference
heatmaps = vit_jit_model(img_crop).detach().cpu().numpy()
File "/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/mocap_mdc_tracking/detect/vit_models/model.py", line 11, in forward
_0 = ops.tensorrt.execute_engine([input_1], __torch___mocap_mdc_tracking_detect_vit_models_model_ViTPose_trt_engine_0x596979e5cbd0)
_1, = _0
input = torch.conv_transpose2d(_1, CONSTANTS.c0, None, [2, 2], [1, 1])
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
input0 = torch.batch_norm(input, CONSTANTS.c1, CONSTANTS.c2, CONSTANTS.c3, CONSTANTS.c4, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input1 = torch.conv_transpose2d(torch.relu(input0), CONSTANTS.c5, None, [2, 2], [1, 1])
Traceback of TorchScript, original code (most recent call last):
File "/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 956, in forward
num_spatial_dims, self.dilation) # type: ignore[arg-type]
return F.conv_transpose2d(
~~~~~~~~~~~~~~~~~~ <--- HERE
input, self.weight, self.bias, self.stride, self.padding,
output_padding, self.groups, self.dilation)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Do have any idea about this issue? Just wonder if you have tested the export with fp16?
Hello! I did not test with half precision as my GPU is a bit old and does not benefit much from it.
Adding the require_full_compilation=True during trt export seems to work and I am able to run inference in half precision (using same modifications you did to the input) Note that from docs it seems that the flag will force dynamo backend to be used.
trt_ts_module = torch_tensorrt.compile(model,
# If the inputs to the module are plain Tensors, specify them via the `inputs` argument:
inputs = [
torch_tensorrt.Input( # Specify input object with shape and dtype
shape=[1, C, H, W],
dtype=torch.half
)
],
enabled_precisions = {torch.half}, # half Run with FP16
workspace_size = 1 << 28,
require_full_compilation=True
)
Still I get a few warnings during export:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[...]
Check the results or try to use more recent opset during export (I did not find a way to specify it).
Note that I did not .half() the model weights as you did, maybe it does changes the export results.
If you manage to have working half precision and see benefits from it don't esitate to fire a PR to adapt inference / exporting, I would appreciate! Lmk how it goes, good luck
thanks for your answer. I tried to use the flag require_full_compilation but it fails
"/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:33: UserWarning: Fail to import MultiScaleDeformableAttention from mmcv.ops.multi_scale_deform_attn, You should install mmcv-full if you need this module.
warnings.warn('Fail to import MultiScaleDeformableAttention from '
/home/khanh/.cache/pypoetry/virtualenvs/mocap-demo-vNvmCQwv-py3.10/lib/python3.10/site-packages/torch_tensorrt/fx/converters/acc_ops_converters.py:3376: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if approximate is not "none":
ERROR: [Torch-TensorRT] - Method requested cannot be compiled end to end by Torch-TensorRT.TorchScript.
Unsupported operators listed below:
- aten::conv_transpose2d.input(Tensor input, Tensor weight, Tensor? bias=None, int[2] stride=1, int[2] padding=0, int[2] output_padding=0, int groups=1, int[2] dilation=1) -> Tensor You can either implement converters for these ops in your application or request implementation"
I am using torch_tensorrt version 1.3.0
Did you try to install mmcv-full as suggested? I can check the version I am using. I am passing the original ckpt not the TorchScript compiled version to torchtrt compile function tho.