LLaMA3-Quantization issues

acc or acc_norm? Which metric is reported

Hi, In Table 2, which metric do you report? For example for arc_easy, lm_eval has two metrics acc and acc_norm, which one do you use?

BaohaoLiao

MLLM Quantization Code

Hi, thank you for this work! I saw in your paper, you also did benchmarking on MLLMs. Do you have the code for that too? Thanks!

xuandy05

What transformers version is "int_llama_layer.py" based on?

Hi, I am wondering what "transformers" version you did refer to write "int_llama_layer.py"? Thanks in advance!

jpyo0803

Release Plans for LLaMA3-70B-8bit (smoothquant)

Very useful repo! Are there any plans to release the LLaMA3-70B-8bit (smoothquant), which maintains the same average performance as the original LLaMA3-70B?

chenguanqi

Full installation instructions

1

Hi, Thank you for the nice work. Do you guys have installation guide for the running environment? Thank you!

ericwudocomoi

How to eval based quant model?

Hi，you have provided GPTQ int4 quant model in https://huggingface.co/LLMQ/LLaMA-3-8B-GPTQ-4bit-b128/tree/main, How to evaluate metric in your code? I have try run "sh scripts/eval_gptq_commonsenseqa.sh", but it only evaluate FP32 model metric，when remove...

xd1073321804

5,6-bit quantizations?

Do you plan to release quantizations with 5 and 6 bits?

roberthoenig

is this based on base pretrained models or instruction tuned models?

1

if it is based on base pretrained models, is there a plan to release results/models for the instruction tuned models? Thanks!

weiliang-zeng

Llama 3 HF Link

1

very useful repo thank you! Are there any plans by chance to release the 2-bit model files? Right now i think the HF link has an empty [QUIP](https://huggingface.co/LLMQ/LLaMA-3-8B-QuIP-2bit) and for...

eva-ritual

Evaluation of EXL2?

This test shows the exl2 quant is the best:`https://huggingface.co/blog/wolfram/llm-comparison-test-llama-3` Will you to test it?

Green-li

LLaMA3-Quantization
LLaMA3-Quantization copied to clipboard

Metadata

acc or acc_norm? Which metric is reported

MLLM Quantization Code

What transformers version is "int_llama_layer.py" based on?

Release Plans for LLaMA3-70B-8bit (smoothquant)

Full installation instructions

How to eval based quant model?

5,6-bit quantizations?

is this based on base pretrained models or instruction tuned models?

Llama 3 HF Link

Evaluation of EXL2?

← Metadata

Owner

Metadata

LLaMA3-Quantization LLaMA3-Quantization copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaMA3-Quantization
LLaMA3-Quantization copied to clipboard