llama.cpp AWQ: Activation-aware Weight Quantization???

https://github.com/mit-han-lab/llm-awq

AWQ: Activation-aware Weight Quantization sounds interesting 🧐🧐🧐

Jun 03 '23 15:06 UnsGentoals

Are you interested in implementing this new algorithm in Llama.cpp? The performance with 3 bits seems amazing

Jun 04 '23 06:06 shouyiwang

Sorry,I'm not pro developer 😔 just a user who care about open llm news, and with tech background.☺️

Jun 04 '23 07:06 UnsGentoals

If my understanding is correct, it is that GPTQ and AWQ are stored in very similar formats and can be stored in the same format as well. Additionally, currently GGML has imperfect support for the format of GPTQ, and improving it could save about 0.5 bit more.

Jun 04 '23 09:06 qwopqwop200

Hi everyone, I have tried to make a PR to add AWQ. I really appreciate the comments to make it better, thanks! The PR

Dec 22 '23 10:12 namtranase

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 10 '24 01:04 github-actions[bot]