Bug: Quantizing Llama 3.1 70B to Q4_K_S with imatrix gives NaN
What happened?
Trying to quantize Llama 3.1 70B to Q4_K_S with imatrix gives NaN for block 48
Tagging @slaren because you always seem to solve these
Didn't see it yet on any other quant size
Name and Version
b3441
What operating system are you seeing the problem on?
Linux
Relevant log output
ggml_validate_row_data: found nan value at block 48
Nvm, Q3_K_L is also giving issues :')
Q3_K_M also gave issue, but not the IQ quants and not Q3_K_S
when it happens it's on blk.60.ffn_down.weight
blk.60.ffn_down.weight - [28672, 8192, 1, 1], type = f32, converting to q5_K .. ggml_validate_row_data: found nan value at block 48
Can you upload the files necessary to reproduce this issue?
I have the imatrix here:
https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/blob/main/Meta-Llama-3.1-70B-Instruct.imatrix
i can upload the f32 but i imagine it'll be faster for you to download the bf16 weights and convert it locally than downloading the gargantuan 260gb files lol
There are 1024 zeroes in the imatrix for this tensor. I can make a patch to ignore this, but I don't see how this could happen without something going wrong during the imatrix generation.
Interestingly when I remade this just now I didn't get any NaN issues.. I also made it with bf16 on CPU instead of f32 on GPU, could have done something? Or maybe my imatrix just messed up 🤷♂️