quantization topic
hqq
Official implementation of Half-Quadratic Quantization (HQQ)
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
Indic-Subtitler
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".
LSQFakeQuantize-PyTorch
FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch
CPT
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin
LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.