Results 2 issues of fwtan

Hi, thanks for the great work! This PR is an attempt to add 2-bit support for SqueezeLLM. It introduces two new kernels: `VecQuant2MatMulKernelNUQPerChannel` `VecQuant2MatMulKernelNUQPerChannelBatched` We evaluated the 2-bit quantized Llama2-13b-hf...