quantization topic

List quantization repositories

hqq

508

Stars

49

Forks

Watchers

Official implementation of Half-Quadratic Quantization (HQQ)

machine-learning

marlin

607

Stars

46

Forks

Watchers

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

KVQuant

286

Stars

25

Forks

Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

efficient-inference

efficient-model

large-language-models

quanto

610

Stars

32

Forks

Watchers

A pytorch Quantization Toolkit

hailo_model_zoo

250

Stars

40

Forks

Watchers

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

ai-accelerators

computer-vision

Indic-Subtitler

75

Stars

12

Forks

Watchers

Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.

owq

60

Stars

7

Forks

Watchers

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".

efficient-model

large-language-models

LSQFakeQuantize-PyTorch

31

Stars

6

Forks

Watchers

FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch

learned-step-size

CPT

29

Stars

6

Forks

Watchers

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

efficient-training

low-precision-training

LightCompress

649

Stars

64

Forks

649

Watchers

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

large-language-models