BrickBee comments

Results 7 comments of


                                            BrickBee

New IQ1_S somehow much worse than previous version

> But when attempting to run an imatrix calculation Same for me with some DeepSeek-based models, which Gorilla is based on. Inference for FP16 and Q8 works, but imatrix calculation...

Shape Error When Running Inference after Converting OpenLlama 3B to GGML

> error loading model: llama.cpp: tensor 'layers.0.feed_forward.w1.weight' has wrong shape; expected 3200 x 8704, got 3200 x 8640 Same for me. It is also broken in the original commit (ffb06a345e3a9e30d39aaa5b46a23201a74be6de),...

Shape Error When Running Inference after Converting OpenLlama 3B to GGML

I can confirm that the quantized files that you've linked work fine with the release version that you have linked. My quantized versions that I've created at the time of...

Shape Error When Running Inference after Converting OpenLlama 3B to GGML

Conversion and fp16 inference works after applying this [diff](https://huggingface.co/SlyEcho/open_llama_3b_ggml/blob/main/convert.py.diff). This was by the way the original point of this issue. The 3b model can't be used with the current code...

Bug: IQ3_M is significantly slower than IQ4_XS on AMD, is it expected?

Potentially related to issue [8760](https://github.com/ggerganov/llama.cpp/issues/8760#issuecomment-2315639527) which also mentions the difference between (IQ1, IQ2, IQ3) and (IQ4 / K)

Feature Request: MoE only load activated expert(s) to GPU while rest non-used experts are not loaded (to CPU/GPU) for DeekSeek-R1 Inference on consumer GPU

It might also improve the performance if a settings is added to pin the router/weighting network to either the VRAM or RAM (selectable). This can reduce the paging-in-from-disk time for...

Bug: llamacpp for CPU/GPU (avx avx2) quants IQ1xx, IQ2xx, IQ3xx are overheating (CPU 90C) CPU ryzen 9 7950x3d but IQ4xx and other quants not (CPU 65C)

I did some token-generation testing on the 7950 x3d with IQ and regular quants. It appears that the IQ quants are simply way more computationally expensive than the K quants,...