TensorRT
TensorRT copied to clipboard
🐛 [Bug] Torch-TRT QDQ nodes affect perf vs PTQ, native TRT they do not
Bug Description
When using the PyT-QAT toolkit, QAT perf is slower than PTQ, for TRT this is not the case.
Torch-TRT:
| Model | Accuracy | Performance |
|---|---|---|
| Baseline MobileNetv2 | 75.56% | 11.92ms |
| Base + TRT(TRT FP32) | 75.59% | 6.78ms |
| PTQ + TRT(TRT int8) | 71.41% | 1.57ms |
| QAT+TRT(TRT INT8) | 74.00% | 2.18ms |
Native TRT:
| Model | Accuracy | Performance |
|---|---|---|
| Baseline MobileNetv2 | 71.11% | 11.92ms |
| Base + TRT (TRT FP32) |
71.13% | 5.95ms |
| PTQ + TRT (TRT int8) |
68.11% | 1.59ms |
| QAT+TRT (TRT INT8) |
70.31% | 1.61ms |
To Reproduce
Steps to reproduce the behavior:
- Torch-TRT notebook
- TRT notebook - reach out to @ncomly-nvidia
Expected behavior
QDQ affect on perf is the same between TRT & Torch-TRT
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
DLFW 22.04: nvcr.io/nvidia/pytorch:22.04-py3