Truenorth8
Results
2
comments of
Truenorth8
@jlamypoirier These are great suggestions. Have any of these found their way upstream? If not, is your version available anywhere? edit: especially curious about > Compute the model head only...
@AniZpZ Existing methods (AWQ, GPTQ) go down to 4-bit quantization, saving lots of memory. The speed improvements of 8-bit inference come during inference, which theoretically could be combined AWQ. Would...