Tan Voon Tao issues

Repositories
Issues
Comments

Results 3 issues of


                                            Tan Voon Tao

Will you guys release optimized kernel (AVX2) for BitNet 1bit ? (weight of +1 or -1)

CUDA kernel seems not optimized.

I'm using the kernel provided `gemm_lowbit()` to do inference for my model evaluation. But it seems like the inference speed abit too slow. I'm using this for my classification task....

Drop-in replacement needed for f.linear?

``` class BitLinearInference(nn.Module): def __init__(self, in_features: int, out_features: int, ): super().__init__() self.in_f = in_features self.out_f = out_features self.register_buffer("w", torch.empty((out_features, in_features))) self.register_buffer("w_scale", torch.empty((1,), dtype=torch.float32)) self.norm = nn.RMSNorm( normalized_shape=in_features, eps=1e-5, elementwise_affine=True )...