TensorRT PTQ is faster than QAT

Description

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: 1080ti NVIDIA Driver Version: 450 CUDA Version: 11.0 CUDNN Version: 8.1.0 Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

When I use PTQ, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Int8': But when I use QAT, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Float32':

Int8 onnx is:

Aug 02 '22 08:08 pangr

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

Aug 03 '22 01:08 zerollzeng

@pangr , what's the op after the add, also have you tried insert Q/DQ after the add? thanks

Aug 26 '22 07:08 ttyio

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

Nov 01 '22 02:11 ttyio