jane

Results 4 comments of jane

Hi @frobnitzem , Thanks for your reply, I got it..

Hi @tonylins Yes, I used llm-awq/awq_cache/llama-7b-w4-g128.pt . - git clone https://huggingface.co/datasets/mit-han-lab/awq-model-zoo awq_cache and Generate real quantized weights (INT4) commands: - python -m awq.entry --model_path ../models/llama-2-7b-hf --w_bit 4 --q_group_size 128 --load_awq...

Hi @casper-hansen ,thanks for your advice. I don't quite understand why you think I'm using llama-1cache, I specified on the command line to use the llama-2-7b-w4-g128.pt. There are no changes...

Hi @Sakits , thanks for your reply. I verified fake quantization PPL with awq_cache/llama-2-7b-w4-g128.pt, and got 16.387 PPL result. - python -m awq.entry --model_path ../models/llama-2-7b --tasks wikitext --w_bit 4 --q_group_size...