Old Man
Old Man
> They will be supported in the future, not sure when. There's not a huge interest because the i-matrix quants are sensibly slower during inference. And it takes a lot...
> Ah, I haven't actually noticed they're that much slower than the K quants, maybe I should try running Q3_K_M instead of IQ3_XS on my Macbook 🤔 They're not
> To be honest anything below Q4 is poor quality, better to pick a smaller model. There are other formats better suited for 2/3 bit than GGUF with 3 bit...
+1 to requesting support for the rest of the IQ quants. I'm especially interested in IQ4_NL, personally. An IQ4_NL quant of Command-R with 2K context fits and works on a...
> From what I understood; the IQ quants are just another format and you can just quantize the model with it but it will be very inefficient and you lose...
@mann1x I never "attacked" you, nor was I defending @sammcj Like I said, I defended *his point*. Thanks for the PR. Are you giving up on IQ4_NL? Should someone else...
> According to this table: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 > > The 8x22B model (which has roughtly 141B parameters, be it WizardLM or not) would have **IQ3_XS** at 58GB, which maybe just be...
> > > According to this table: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 > > > The 8x22B model (which has roughtly 141B parameters, be it WizardLM or not) would have **IQ3_XS** at 58GB, which...
> I have updated the PR to fix IQ4_NL support, I will add the benchmark to the table above Thank you
Same problem here