Suggestion: QAT mode for deployment

Open Sazoji opened this issue 4 years ago • 0 comments

ONNX seems to be the defacto deployment option and scripts are available for training and deployment of BasicSR weights to an onnx version of the model; but for accelerating past an AMP or fp16 mode, INT8 quantization (kept accurate with QAT/fake int8 training) can get far faster deployments without quantization loss. Video or near-realtime implementations for quantized int8 ONNX models could be a lot accurate if the trained model used FakeQuantize. CPU or mobile inference under quantization would also be usable for edge cases like thin laptops, ARM devices, M1 silicon, and cellphones without the quantization loss when creating the base ONNX-int8 model. Fp16 on ONNXruntime or device-specific "optimizers" like OpenVINO and TensorRT are also an option, but doesn't have as extreme of a speed improvement for unaccelerated systems and edge devices. https://pytorch.org/docs/stable/quantization.html

Jul 23 '21 07:07 Sazoji