DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[BERT/PyTorch] How to get the inference performance with INT8

Open Zack0617 opened this issue 3 years ago • 2 comments

Related to BERT/PyTorch

Describe the bug:

I want to reproduce the inferencing performance with INT8 on T4 or A2, but I don't know how to reproduce and compare with the inferencing performance NVIDIA updated monthly in following page, could someone give some instructions, thanks.

https://developer.nvidia.com/deep-learning-performance-training-inference

image

Zack0617 avatar Aug 01 '22 11:08 Zack0617

Hi @Zack0617, have you tried following repro instructions in the guide referenced in your link? It says "Reproduce on your systems by following the instructions in the Measuring Training and Inferencing Performance on NVIDIA AI Platforms Reviewer’s Guide"

mk-nvidia avatar Aug 03 '22 17:08 mk-nvidia

Hi @mk-nvidia , thank you for your feedback. I have tried following the "Measuring Training and Inferencing Performance on NVIDIA AI Platforms Reviewer’s Guide" before, the Inference Performances precision are FP32 or FP16, so it is not easy to compare with NVIDIA provided as above picture, which is INT8 precision.

Zack0617 avatar Aug 05 '22 12:08 Zack0617