llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

AWQ: Activation-aware Weight Quantization???

Open UnsGentoals opened this issue 2 years ago β€’ 3 comments

https://github.com/mit-han-lab/llm-awq

AWQ: Activation-aware Weight Quantization sounds interesting 🧐🧐🧐

UnsGentoals avatar Jun 03 '23 15:06 UnsGentoals

Are you interested in implementing this new algorithm in Llama.cpp? The performance with 3 bits seems amazing

shouyiwang avatar Jun 04 '23 06:06 shouyiwang

Sorry,I'm not pro developer πŸ˜” just a user who care about open llm news, and with tech background.☺️

UnsGentoals avatar Jun 04 '23 07:06 UnsGentoals

If my understanding is correct, it is that GPTQ and AWQ are stored in very similar formats and can be stored in the same format as well. Additionally, currently GGML has imperfect support for the format of GPTQ, and improving it could save about 0.5 bit more.

qwopqwop200 avatar Jun 04 '23 09:06 qwopqwop200

Hi everyone, I have tried to make a PR to add AWQ. I really appreciate the comments to make it better, thanks! The PR

namtranase avatar Dec 22 '23 10:12 namtranase

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 10 '24 01:04 github-actions[bot]