TensorRT [castLayer.cpp::validate::33] Error Code 2: Internal Error (Assertion !mOutputTypes.at(0).hasValue() || mOutputTypes.at(0).value() == params.toType failed. )

Description

I tried to convert a model based on dalle2 clip's text encoder model, I first convert to onnx, and then to trt. I am trying with only single batch, and I also did polygraphy surgeon sanitize the onnx model. Regardless if I sanitize it or not, I get the error in the title. Attached the full log as well. I have no clue how to debug this, could not even locate castLayer.cpp. Would appreciate any pointers, thanks!

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: A10G, Cuda 12.1

NVIDIA Driver Version: 525

CUDA Version: 12.1

CUDNN Version: 8.8.0

Operating System: Ubuntu-20:04

Python Version (if applicable): 3.10

Tensorflow Version (if applicable): NA

PyTorch Version (if applicable): 2.3.0.dev20240117 (nightly)

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:23.12-py3

Relevant Files

Model link: error_log.txt Model is 810M, same model architecture as Dalle Clip text encoder. It doesn't allow me to upload.

Steps To Reproduce

Commands or scripts: To Onnx: onnx_model = torch.onnx.export(encode_text, (tokenized_text), "text_encoder.onnx", export_params=True, input_names=['text'], output_names=['text_feature'], ) (Optional): polygraphy surgeon sanitize text_encoder.onnx --fold-constants -o folded.onnx To Trt: trtexec --onnx=folded.onnx --fp16 --saveEngine=model.trt --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32 --verbos

Have you tried the latest release?: I used 8.6.1 is the latest

Jan 21 '24 05:01 ecilay

Looks like you set layer precision constrain which violate the cast(which force precision)

Does it work if you run trtexec --onnx=folded.onnx --fp16 --int8?

Jan 24 '24 13:01 zerollzeng

Update: fix typo

Another suggestion is to check the cast layer in your model.

Jan 24 '24 13:01 zerollzeng

Thanks, so you mean remove --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32 ? How do I inpsect the case layer? What shall I look for?

Jan 24 '24 19:01 ecilay

so you mean remove --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32 ?

@nvpohanh do we support usage like this? *:fp16,*:fp32

Jan 27 '24 11:01 zerollzeng

How do I inpsect the case layer? What shall I look for?

Use netron.

Jan 27 '24 11:01 zerollzeng

@nvpohanh do we support usage like this? :fp16,:fp32

No. What does that mean? you can only specify one output type per tensor

Jan 29 '24 02:01 nvpohanh

How do I inpsect the case layer? What shall I look for?

Use netron.

I used neuron, but I think the information from it is very abstract, what exactly shall I look for in the cast layer? There are so many cast layer if I search for it, and the error log doesn't say which cast I should look for

Jan 31 '24 18:01 ecilay

Does it work if you run without --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32?

Feb 01 '24 14:02 zerollzeng

Just trtexec --onnx=folded.onnx --fp16 --int8

Feb 01 '24 14:02 zerollzeng

Thanks @zerollzeng , I find that as long as I get rid of layerOutputTypes it works. Can I know why I would need both --fp16 --int8? Would it be redundant to specify precisionConstraints with --fp16?

Feb 01 '24 18:02 ecilay

Another thing is if I used the trt model converted dynamically with --minShapes=text:1x48 --optShapes=text:2x48 --maxShapes=text:4x48, the inference doesn't work, since shape = engine.get_binding_shape(0) will be (-1, 48), thus when I create the binding with a puedo input tensor, the input tensor won't accept negative shape. Or would you provide an example for inference with dynamic shapes?

Feb 01 '24 19:02 ecilay

When working with dynamic shapes, you need to specify runtime dimensions. See docs here: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions

Feb 02 '24 01:02 nvpohanh

When working with dynamic shapes, you need to specify runtime dimensions. See docs here: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions

thanks this works

Feb 03 '24 01:02 ecilay

Another thing I observe very weird is that when I do trtexec --onnx=model.onnx --fp16, the output trt model inference results are very wrong; if I get rid of --fp16. the results align with original pytorch model or onnx mdoel. Both the original pytroch model and the onnx model are in fp16. Is this expected?

Feb 03 '24 01:02 ecilay

@nvpohanh is it a bug?

Feb 07 '24 09:02 zerollzeng

That usually means something overflows in FP16. Could you share the repro steps so that we can debug it? Thanks

Feb 07 '24 10:02 nvpohanh

@nvpohanh That's what I thought initially too, but the converted from Pytorch and onnx are both fp16. The pytorch model is the clip's text encoder. During model loading, it converts to fp16. You can repro by using the open source model and convert the text encoder to trt. The input is tokenized_text = clip.tokenize([prompt], truncate=True, context_length=48).type(torch.int).to(device) which is int32. Output is fp16 text encodings. That usually means something overflows in FP16

This is how I converted to onnx:

onnx_model = torch.onnx.export(encode_text, (tokenized_text), "adobeone_text_encoder.onnx", export_params=True, input_names=['text'], output_names=['text_feature'], dynamic_axes = {"text": [0]} )

Also when I add argument do_constant_folding=True to above onnx conversion, the conversion to trt won't work it will report trt doesn't support dynamic batch. Is there a timeline for supporting this? Currently I add arguments minShape, Optshape, maxShape when I convert to trt. But as long as I have do_constant_folding=True , the trt conversion won't work.

Feb 07 '24 18:02 ecilay

Is there a tool we can visualize if it is overflown with fp16?

Feb 07 '24 18:02 ecilay

@ecilay Could you share the ONNX model that you have exported?

Also when I add argument do_constant_folding=True to above onnx conversion, the conversion to trt won't work it will report trt doesn't support dynamic batch. Is there a timeline for supporting this?

TRT already supports dynamic shapes, and using do_constant_folding=True should not have broken that. Could you share the ONNX model with and without do_constant_folding=True so that we can take a look? thanks

Feb 08 '24 05:02 nvpohanh

Sorry I don't think I can share this in public since this is my company's model in production. But it is just clip's text encoder, which I shared repro steps above.

Feb 10 '24 01:02 ecilay

sorry without repro we cannot debug the issue, I will close this since there is no activity, pls reopen when we get a repro.

Jul 02 '24 17:07 ttyio