quantization topic

List quantization repositories

hqq

508
Stars
49
Forks
Watchers

Official implementation of Half-Quadratic Quantization (HQQ)

marlin

607
Stars
46
Forks
Watchers

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

KVQuant

286
Stars
25
Forks
Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

quanto

610
Stars
32
Forks
Watchers

A pytorch Quantization Toolkit

hailo_model_zoo

250
Stars
40
Forks
Watchers

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Indic-Subtitler

75
Stars
12
Forks
Watchers

Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.

owq

60
Stars
7
Forks
Watchers

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".

LSQFakeQuantize-PyTorch

31
Stars
6
Forks
Watchers

FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch

CPT

29
Stars
6
Forks
Watchers

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

LightCompress

649
Stars
64
Forks
649
Watchers

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.