TensorRT Recommended Torch Quantization Library to Use -- Modelopt v.s. Pytorch-quantization

Description

I have seen two quantization librares built by Nvidia: a TRT modelopt and a pytorch-quantization. What are the differences between the two libraries?

My use case is to do PTQ (and potentially QAT) on a pytorch model, export it to onnx and then covert it with TRT. In this case, what library should I choose to use?

Thanks,

Jul 09 '24 22:07 YixuanSeanZhou

TRT modelopt include pytorch-quantization.

See https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html

ModelOpt PyTorch quantization is refactored based on pytorch_quantization.

Key advantages offered by ModelOpt’s PyTorch quantization:

1 Support advanced quantization formats, e.g., Block-wise Int4 and FP8.

2 Native support for LLM models in Hugging Face and NeMo.

3 Advanced Quantization algorithms, e.g., SmoothQuant, AWQ.

4 Deployment support to ONNX and NVIDIA TensorRT.

Both tools you can use.

https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#

Jul 10 '24 01:07 lix19937

Gotcha! Thank you for the clarification!

Jul 22 '24 15:07 YixuanSeanZhou