LLaMA3-Quantization
LLaMA3-Quantization copied to clipboard
Evaluation of GGUF?
GGUF (llama.cpp) is a very popular format used for handling quantized models - would be nice to see evaluation of that, and also quants like Q6_K (0.16% PPL difference vs fp16).
Evaluations for Llama 2 70B are available on llama.cpp project: https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md