AQLM
AQLM copied to clipboard
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf
Hi! Thanks for such a useful tool! I have a question about `model_seqlen`: As I can see default value in main.py is 4096. What if I'll use a smaller values...
Congratulations on coming up with such an excellent quantization algorithm! I'm trying to use AQLM to quantize Deepseek-Coder and Starcoder2, but the repository doesn't seem to have direct support. Are...
hugginface -> Hugging Face
Are Ollama or other frameworks or tools capable of serving AQLM quantized models?
I found these clones while porting the CUDA kernels to vLLM. I couldn't see what they were for (avoid memory fragmentation?) but got a 2% speed improvement on your llama2...
Hi, we have performed a small experiment on fine-tuning the Llama-2-70B-AQLM-2Bit model using the PEFT QLoRA method. We utilized the Alpaca and Glaive datasets for instruction tuning, and the fine-tuned...
Hi, I've been encountering this problem when running this [notebook](https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing#scrollTo=ZDOtpnJGizsx) on local machine with an L4. I've followed the instructions from the notebook on `Python3.10`. ``` Name: torch Version: 2.2.0...
is it toooo long to quantized a model ?
How long is the expected time to quantize a 7b mistral model ?