Nexes the Elder

Results 5 issues of Nexes the Elder

### Background Description ![Screenshot 2024-06-16 at 20-51-51 GGUF My Repo - a Hugg](https://github.com/ggerganov/llama.cpp/assets/124105151/9d853b81-8c43-4520-b8c9-f60a1f9c7299) On HF, GGUF my Repo scripts seem to currently use the old one, and LlamaCPP probably got...

stale

Here's a few edits I consider useful to improve a bit the IQ2 model quant strategies for some models: - The tensor attn.v.weight passed in Q4_K for models like Gemma...

examples
Review Complexity : Low

This PR simply replicates the tensor per tensor custom quantization CLI feature brought by Ikawrakow for the token embeddings and output tensors in #6239 to : - attn_q.weight - attn_k.weight...

demo
examples
Review Complexity : Medium

Hey. I would like to make a merge of Nemotron 3.1 on a Llama 3.3 instruct base. From what I understand, Nemotron 3.1 is based on Llama 3.1 instruct. And...

Here's what I'd suggest for starters : - Rationalize Q2_K_S ffn_down and attn_v (+1% size, -2.5% ppl) - Bump attn_v and attn_k for Q2_K_S and Q2_K if GQA>=2. Uncripple attn_k...