Nexes the Elder issues

Results 5 issues of


                                            Nexes the Elder

Refactor: GGUF my Repo tool on HF needs its scripts updated with the new naming scheme

### Background Description ![Screenshot 2024-06-16 at 20-51-51 GGUF My Repo - a Hugg](https://github.com/ggerganov/llama.cpp/assets/124105151/9d853b81-8c43-4520-b8c9-f60a1f9c7299) On HF, GGUF my Repo scripts seem to currently use the old one, and LlamaCPP probably got...

stale

Changes for the existing quant strategies / FTYPEs and new ones

Here's a few edits I consider useful to improve a bit the IQ2 model quant strategies for some models: - The tensor attn.v.weight passed in Q4_K for models like Gemma...

examples

Review Complexity : Low

Quantize: specify each major tensor quant in CLI for common LLMs

This PR simply replicates the tensor per tensor custom quantization CLI feature brought by Ikawrakow for the token embeddings and output tensors in #6239 to : - attn_q.weight - attn_k.weight...

demo

examples

Review Complexity : Medium

Nemotron 3.3 attempt.

Hey. I would like to make a merge of Nemotron 3.1 on a Llama 3.3 instruct base. From what I understand, Nemotron 3.1 is based on Llama 3.1 instruct. And...

Some minor quant strategies tweaks

Here's what I'd suggest for starters : - Rationalize Q2_K_S ffn_down and attn_v (+1% size, -2.5% ppl) - Bump attn_v and attn_k for Q2_K_S and Q2_K if GQA>=2. Uncripple attn_k...