[castLayer.cpp::validate::33] Error Code 2: Internal Error (Assertion !mOutputTypes.at(0).hasValue() || mOutputTypes.at(0).value() == params.toType failed. )
Description
I tried to convert a model based on dalle2 clip's text encoder model, I first convert to onnx, and then to trt. I am trying with only single batch, and I also did polygraphy surgeon sanitize the onnx model. Regardless if I sanitize it or not, I get the error in the title. Attached the full log as well.
I have no clue how to debug this, could not even locate castLayer.cpp. Would appreciate any pointers, thanks!
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: A10G, Cuda 12.1
NVIDIA Driver Version: 525
CUDA Version: 12.1
CUDNN Version: 8.8.0
Operating System: Ubuntu-20:04
Python Version (if applicable): 3.10
Tensorflow Version (if applicable): NA
PyTorch Version (if applicable): 2.3.0.dev20240117 (nightly)
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:23.12-py3
Relevant Files
Model link: error_log.txt Model is 810M, same model architecture as Dalle Clip text encoder. It doesn't allow me to upload.
Steps To Reproduce
Commands or scripts:
To Onnx:
onnx_model = torch.onnx.export(encode_text, (tokenized_text), "text_encoder.onnx", export_params=True, input_names=['text'], output_names=['text_feature'], )
(Optional): polygraphy surgeon sanitize text_encoder.onnx --fold-constants -o folded.onnx
To Trt:
trtexec --onnx=folded.onnx --fp16 --saveEngine=model.trt --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32 --verbos
Have you tried the latest release?: I used 8.6.1 is the latest
Looks like you set layer precision constrain which violate the cast(which force precision)
Does it work if you run trtexec --onnx=folded.onnx --fp16 --int8?
Update: fix typo
Another suggestion is to check the cast layer in your model.
Thanks, so you mean remove --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32 ?
How do I inpsect the case layer? What shall I look for?
so you mean remove
--precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32?
@nvpohanh do we support usage like this? *:fp16,*:fp32
How do I inpsect the case layer? What shall I look for?
Use netron.
@nvpohanh do we support usage like this? :fp16,:fp32
No. What does that mean? you can only specify one output type per tensor
How do I inpsect the case layer? What shall I look for?
Use netron.
I used neuron, but I think the information from it is very abstract, what exactly shall I look for in the cast layer? There are so many cast layer if I search for it, and the error log doesn't say which cast I should look for
Does it work if you run without --precisionConstraints=prefer --layerPrecisions=*:fp16,*:fp32 --layerOutputTypes=*:fp16,*:fp32?
Just trtexec --onnx=folded.onnx --fp16 --int8
Thanks @zerollzeng , I find that as long as I get rid of layerOutputTypes it works.
Can I know why I would need both --fp16 --int8?
Would it be redundant to specify precisionConstraints with --fp16?
Another thing is if I used the trt model converted dynamically with --minShapes=text:1x48 --optShapes=text:2x48 --maxShapes=text:4x48, the inference doesn't work, since shape = engine.get_binding_shape(0) will be (-1, 48), thus when I create the binding with a puedo input tensor, the input tensor won't accept negative shape.
Or would you provide an example for inference with dynamic shapes?
When working with dynamic shapes, you need to specify runtime dimensions. See docs here: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions
When working with dynamic shapes, you need to specify runtime dimensions. See docs here: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions
thanks this works
Another thing I observe very weird is that when I do trtexec --onnx=model.onnx --fp16, the output trt model inference results are very wrong; if I get rid of --fp16. the results align with original pytorch model or onnx mdoel. Both the original pytroch model and the onnx model are in fp16.
Is this expected?
@nvpohanh is it a bug?
That usually means something overflows in FP16. Could you share the repro steps so that we can debug it? Thanks
@nvpohanh That's what I thought initially too, but the converted from Pytorch and onnx are both fp16.
The pytorch model is the clip's text encoder. During model loading, it converts to fp16. You can repro by using the open source model and convert the text encoder to trt. The input is tokenized_text = clip.tokenize([prompt], truncate=True, context_length=48).type(torch.int).to(device) which is int32. Output is fp16 text encodings.
That usually means something overflows in FP16
This is how I converted to onnx:
onnx_model = torch.onnx.export(encode_text, (tokenized_text), "adobeone_text_encoder.onnx", export_params=True, input_names=['text'], output_names=['text_feature'], dynamic_axes = {"text": [0]} )
Also when I add argument do_constant_folding=True to above onnx conversion, the conversion to trt won't work it will report trt doesn't support dynamic batch. Is there a timeline for supporting this? Currently I add arguments minShape, Optshape, maxShape when I convert to trt. But as long as I have do_constant_folding=True , the trt conversion won't work.
Is there a tool we can visualize if it is overflown with fp16?
@ecilay Could you share the ONNX model that you have exported?
Also when I add argument do_constant_folding=True to above onnx conversion, the conversion to trt won't work it will report trt doesn't support dynamic batch. Is there a timeline for supporting this?
TRT already supports dynamic shapes, and using do_constant_folding=True should not have broken that. Could you share the ONNX model with and without do_constant_folding=True so that we can take a look? thanks
Sorry I don't think I can share this in public since this is my company's model in production. But it is just clip's text encoder, which I shared repro steps above.
sorry without repro we cannot debug the issue, I will close this since there is no activity, pls reopen when we get a repro.