llama.cpp segfault during iq2

When doing a iq2_xxs quantisation of https://huggingface.co/Kquant03/BurningBruce-SOLAR-8x10.7B-bf16 quantize simply segfaults:

[ 391/ 483] blk.17.ffn_gate_exps.weight - [ 4096, 14336, 8, 1], type = f32, converting to iq2_xxs .. size = 1792.00 MiB -> 115.50 MiB [ 392/ 483] blk.17.ffn_down_exps.weight - [14336, 4096, 8, 1], type = f32, converting to iq2_xxs .. size = 1792.00 MiB -> 115.50 MiB [ 393/ 483] blk.17.ffn_up_exps.weight - [ 4096, 14336, 8, 1], type = f32, converting to iq2_xxs .. size = 1792.00 MiB -> 115.50 MiB [ 394/ 483] blk.18.ffn_gate_exps.weight - [ 4096, 14336, 8, 1], type = f32, converting to iq2_xxs .. /root/s2/quantize: line 178: 682015 Segmentation fault "$QUANTIZE" --allow-requantize $IMATRIX "$srcgguf" ./"$OUT.$HOSTNAME~" "$qmethod"

Tried it twice, happens reproducibly. all Q* quants, IQ3* and IQ4* quants were generated without issue.

The imatrix.dat was this one: https://huggingface.co/mradermacher/BurningBruce-SOLAR-8x10.7B-bf16-i1-GGUF/blob/main/imatrix.dat

The source gguf was created using convert.py --vocab-type bpe,hfft,spm --pad-vocab --skip-unknown (my current default args)

llama version: 2699

Apr 22 '24 11:04 schmorp

With a debug build: [ 394/ 483] blk.18.ffn_gate_exps.weight - [ 4096, 14336, 8, 1], type = f32, converting to iq2_xxs .. quantize: ggml-quants.c:1313: nearest_int: Assertion `fval <= 4194303.f' failed.

Apr 23 '24 12:04 slaren

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 07 '24 01:06 github-actions[bot]

segfault during iq2_xxs quantisation