AQLM issues

How model_seqlen affects quantization quality

2

Hi! Thanks for such a useful tool! I have a question about `model_seqlen`: As I can see default value in main.py is 4096. What if I'll use a smaller values...

VirtualRoyalty

Supported Models

1

Congratulations on coming up with such an excellent quantization algorithm! I'm trying to use AQLM to quantize Deepseek-Coder and Starcoder2, but the repository doesn't seem to have direct support. Are...

TechxGenus

MPS support

2

BlackSamorez

Update README.md

hugginface -> Hugging Face

eltociear

Are there any tools that can serve AQLM quantized models?

Are Ollama or other frameworks or tools capable of serving AQLM quantized models?

Mayorc1978

Remove post flattening CUDA `clone()`s, for 2% speedup in a 1x16 7B llama2

2

I found these clones while porting the CUDA kernels to vLLM. I couldn't see what they were for (avoid memory fragmentation?) but got a 2% speed improvement on your llama2...

jaemzfleming

Case Study: Instruction Tuning on AQLM Models

6

Hi, we have performed a small experiment on fine-tuning the Llama-2-70B-AQLM-2Bit model using the PEFT QLoRA method. We utilized the Alpaca and Glaive datasets for instruction tuning, and the fine-tuned...

hiyouga

RuntimeError: Unknown layout

3

Hi, I've been encountering this problem when running this [notebook](https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing#scrollTo=ZDOtpnJGizsx) on local machine with an L4. I've followed the instructions from the notebook on `Python3.10`. ``` Name: torch Version: 2.2.0...

sankeerth43

How long for the quantizing a 70b model? I had ran for 2days

2

is it toooo long to quantized a model ?

xiechengmude

Quantization Time

11

How long is the expected time to quantize a 7b mistral model ?

DRXD1000

AQLM
AQLM copied to clipboard

Metadata

How model_seqlen affects quantization quality

Supported Models

MPS support

Update README.md

Are there any tools that can serve AQLM quantized models?

Remove post flattening CUDA `clone()`s, for 2% speedup in a 1x16 7B llama2

Case Study: Instruction Tuning on AQLM Models

RuntimeError: Unknown layout

How long for the quantizing a 70b model? I had ran for 2days

Quantization Time

← Metadata

Owner

Metadata

AQLM AQLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

AQLM
AQLM copied to clipboard