Tan Voon Tao

Results 3 issues of Tan Voon Tao

I'm using the kernel provided `gemm_lowbit()` to do inference for my model evaluation. But it seems like the inference speed abit too slow. I'm using this for my classification task....

``` class BitLinearInference(nn.Module): def __init__(self, in_features: int, out_features: int, ): super().__init__() self.in_f = in_features self.out_f = out_features self.register_buffer("w", torch.empty((out_features, in_features))) self.register_buffer("w_scale", torch.empty((1,), dtype=torch.float32)) self.norm = nn.RMSNorm( normalized_shape=in_features, eps=1e-5, elementwise_affine=True )...