TensorRT-LLM
TensorRT-LLM copied to clipboard
How to quantize customed models, such as LVM?
Accodring to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization, Can I define my model and calibration process and then simply use modelopt.torch.quantization.quantize() ?