Recommended Torch Quantization Library to Use -- Modelopt v.s. Pytorch-quantization
Description
I have seen two quantization librares built by Nvidia: a TRT modelopt and a pytorch-quantization. What are the differences between the two libraries?
My use case is to do PTQ (and potentially QAT) on a pytorch model, export it to onnx and then covert it with TRT. In this case, what library should I choose to use?
Thanks,
TRT modelopt include pytorch-quantization.
See https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html
ModelOpt PyTorch quantization is refactored based on pytorch_quantization.
Key advantages offered by ModelOpt’s PyTorch quantization:
1 Support advanced quantization formats, e.g., Block-wise Int4 and FP8.
2 Native support for LLM models in Hugging Face and NeMo.
3 Advanced Quantization algorithms, e.g., SmoothQuant, AWQ.
4 Deployment support to ONNX and NVIDIA TensorRT.
Both tools you can use.
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#
Gotcha! Thank you for the clarification!