TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

PTQ is faster than QAT

Open pangr opened this issue 3 years ago • 2 comments

Description

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: 1080ti NVIDIA Driver Version: 450 CUDA Version: 11.0 CUDNN Version: 8.1.0 Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

When I use PTQ, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Int8': image But when I use QAT, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Float32': image

Int8 onnx is: image

pangr avatar Aug 02 '22 08:08 pangr

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

zerollzeng avatar Aug 03 '22 01:08 zerollzeng

@pangr , what's the op after the add, also have you tried insert Q/DQ after the add? thanks

ttyio avatar Aug 26 '22 07:08 ttyio

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio avatar Nov 01 '22 02:11 ttyio