TRT compilation failure of TensorRT 8.6 when running quantized resnet18 on GPU A4000
Description
I was trying to use TRT modelopt library to quantize a resnet18 from pytorch. The code to reproduce is:
from torchvision import models
from torch import nn, optim
# Define the models
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = nn.Linear(resnet18.fc.in_features, 10)
resnet18.to('cpu')
def forward_loop(model):
for images, labels in tqdm(testloader):
model(images)
config = deepcopy(mtq.INT8_SMOOTHQUANT_CFG)
resnet18 = mtq.quantize(resnet18, config, forward_loop=forward_loop)
torch.onnx.export(resnet18, torch.randn(1, 3, 32, 32), os.path.join(quantized_dir, "saved_model.onnx"), verbose=True, input_names=["input"], output_names=["output"])
I then run polygraphy constant folding
python ~/trt_model_opt/bin/polygraphy surgeon sanitize --fold-constants saved_model.onnx -o saved_model.onnx
Then when i compile it with TRT, i got the following
[07/12/2024-00:08:34] [TRT] [V] --------------- Timing Runner: /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007])
[07/12/2024-00:08:34] [TRT] [V] Skipping tactic 0x0000000000000000 due to exception Assertion numScales == mGlobRefs.scale.count() failed.
[07/12/2024-00:08:34] [TRT] [V] /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007]) profiling completed in 0.011867 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/12/2024-00:08:34] [TRT] [V] *************** Autotuning format combination: Int8(128,64:32,8,1) -> Float(4096,64,8,1) ***************
[07/12/2024-00:08:34] [TRT] [V] --------------- Timing Runner: /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007])
[07/12/2024-00:08:34] [TRT] [V] Skipping tactic 0x0000000000000000 due to exception Assertion numScales == mGlobRefs.scale.count() failed.
[07/12/2024-00:08:34] [TRT] [V] /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007]) profiling completed in 0.0113962 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/12/2024-00:08:34] [TRT] [E] 10: Could not find any implementation for node /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1.
Could you please take a look?
Environment
TensorRT Version: 8.6
Relevant Files
Model link: https://file.io/1IiicEUK92IM
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?: Didn't try TRT 10
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes
It seems has error exception Assertion numScales == mGlobRefs.scale.count() failed. Can you use the latest version of trt ?
Got it, i will give TRT10 a try