LLaMA3-Quantization icon indicating copy to clipboard operation
LLaMA3-Quantization copied to clipboard

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Results 12 LLaMA3-Quantization issues
Sort by recently updated
recently updated
newest added

Hi, In Table 2, which metric do you report? For example for arc_easy, lm_eval has two metrics acc and acc_norm, which one do you use?

Hi, thank you for this work! I saw in your paper, you also did benchmarking on MLLMs. Do you have the code for that too? Thanks!

Hi, I am wondering what "transformers" version you did refer to write "int_llama_layer.py"? Thanks in advance!

Very useful repo! Are there any plans to release the LLaMA3-70B-8bit (smoothquant), which maintains the same average performance as the original LLaMA3-70B?

Hi, Thank you for the nice work. Do you guys have installation guide for the running environment? Thank you!

Hi,you have provided GPTQ int4 quant model in https://huggingface.co/LLMQ/LLaMA-3-8B-GPTQ-4bit-b128/tree/main, How to evaluate metric in your code? I have try run "sh scripts/eval_gptq_commonsenseqa.sh", but it only evaluate FP32 model metric,when remove...

Do you plan to release quantizations with 5 and 6 bits?

if it is based on base pretrained models, is there a plan to release results/models for the instruction tuned models? Thanks!

very useful repo thank you! Are there any plans by chance to release the 2-bit model files? Right now i think the HF link has an empty [QUIP](https://huggingface.co/LLMQ/LLaMA-3-8B-QuIP-2bit) and for...

This test shows the exl2 quant is the best:`https://huggingface.co/blog/wolfram/llm-comparison-test-llama-3` Will you to test it?