llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Bug: Quantizing Llama 3.1 70B to Q4_K_S with imatrix gives NaN

Open bartowski1182 opened this issue 1 year ago • 3 comments

What happened?

Trying to quantize Llama 3.1 70B to Q4_K_S with imatrix gives NaN for block 48

Tagging @slaren because you always seem to solve these

Didn't see it yet on any other quant size

Name and Version

b3441

What operating system are you seeing the problem on?

Linux

Relevant log output

ggml_validate_row_data: found nan value at block 48

bartowski1182 avatar Jul 23 '24 23:07 bartowski1182

Nvm, Q3_K_L is also giving issues :')

Q3_K_M also gave issue, but not the IQ quants and not Q3_K_S

when it happens it's on blk.60.ffn_down.weight

blk.60.ffn_down.weight - [28672,  8192,     1,     1], type =    f32, converting to q5_K .. ggml_validate_row_data: found nan value at block 48

bartowski1182 avatar Jul 23 '24 23:07 bartowski1182

Can you upload the files necessary to reproduce this issue?

slaren avatar Jul 24 '24 11:07 slaren

I have the imatrix here:

https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/blob/main/Meta-Llama-3.1-70B-Instruct.imatrix

i can upload the f32 but i imagine it'll be faster for you to download the bf16 weights and convert it locally than downloading the gargantuan 260gb files lol

bartowski1182 avatar Jul 24 '24 14:07 bartowski1182

There are 1024 zeroes in the imatrix for this tensor. I can make a patch to ignore this, but I don't see how this could happen without something going wrong during the imatrix generation.

slaren avatar Jul 27 '24 23:07 slaren

Interestingly when I remade this just now I didn't get any NaN issues.. I also made it with bf16 on CPU instead of f32 on GPU, could have done something? Or maybe my imatrix just messed up 🤷‍♂️

bartowski1182 avatar Jul 27 '24 23:07 bartowski1182