TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] Torch-TRT QDQ nodes affect perf vs PTQ, native TRT they do not

Open ncomly-nvidia opened this issue 3 years ago • 0 comments

Bug Description

When using the PyT-QAT toolkit, QAT perf is slower than PTQ, for TRT this is not the case.

Torch-TRT:

Model Accuracy Performance
Baseline MobileNetv2 75.56% 11.92ms
Base + TRT(TRT FP32) 75.59% 6.78ms
PTQ + TRT(TRT int8) 71.41% 1.57ms
QAT+TRT(TRT INT8) 74.00% 2.18ms

Native TRT:

Model Accuracy Performance
Baseline MobileNetv2 71.11% 11.92ms
Base + TRT
(TRT FP32)
71.13% 5.95ms
PTQ + TRT
(TRT int8)
68.11% 1.59ms
QAT+TRT
(TRT INT8)
70.31% 1.61ms

To Reproduce

Steps to reproduce the behavior:

  1. Torch-TRT notebook
  2. TRT notebook - reach out to @ncomly-nvidia

Expected behavior

QDQ affect on perf is the same between TRT & Torch-TRT

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

DLFW 22.04: nvcr.io/nvidia/pytorch:22.04-py3

Additional context

ncomly-nvidia avatar Aug 30 '22 17:08 ncomly-nvidia