fwtan
Results
2
issues of
fwtan
Hi, thanks for the great work! This PR is an attempt to add 2-bit support for SqueezeLLM. It introduces two new kernels: `VecQuant2MatMulKernelNUQPerChannel` `VecQuant2MatMulKernelNUQPerChannelBatched` We evaluated the 2-bit quantized Llama2-13b-hf...