Nexes the Elder
Nexes the Elder
### Background Description  On HF, GGUF my Repo scripts seem to currently use the old one, and LlamaCPP probably got...
Here's a few edits I consider useful to improve a bit the IQ2 model quant strategies for some models: - The tensor attn.v.weight passed in Q4_K for models like Gemma...
This PR simply replicates the tensor per tensor custom quantization CLI feature brought by Ikawrakow for the token embeddings and output tensors in #6239 to : - attn_q.weight - attn_k.weight...
Hey. I would like to make a merge of Nemotron 3.1 on a Llama 3.3 instruct base. From what I understand, Nemotron 3.1 is based on Llama 3.1 instruct. And...
Here's what I'd suggest for starters : - Rationalize Q2_K_S ffn_down and attn_v (+1% size, -2.5% ppl) - Bump attn_v and attn_k for Q2_K_S and Q2_K if GQA>=2. Uncripple attn_k...