AWQ: Activation-aware Weight Quantization???
https://github.com/mit-han-lab/llm-awq
AWQ: Activation-aware Weight Quantization sounds interesting π§π§π§
Are you interested in implementing this new algorithm in Llama.cpp? The performance with 3 bits seems amazing
Sorry,I'm not pro developer π just a user who care about open llm news, and with tech background.βΊοΈ
If my understanding is correct, it is that GPTQ and AWQ are stored in very similar formats and can be stored in the same format as well. Additionally, currently GGML has imperfect support for the format of GPTQ, and improving it could save about 0.5 bit more.
Hi everyone, I have tried to make a PR to add AWQ. I really appreciate the comments to make it better, thanks! The PR
This issue was closed because it has been inactive for 14 days since being marked as stale.