BitNet
BitNet copied to clipboard
Official inference framework for 1-bit LLMs
While reviewing the code in this repository, I noticed a few areas that could be optimized for efficiency. I decided to make some changes to how the models are loaded...
Add gemm kernel for int2 weight. Also fix scaling problems in previous bitlinear kernel.
Fixed error in `setup_env.py` that prevented downloading of **_Falcon3 Base_** models
Hi team, I'm trying to use this model locally on Windows with minimal system dependencies, ideally via llama.cpp. In OpenAI Playground, we can set a System Instruction and the model...
A bug has been reported, but further details are needed to provide actionable steps for reproduction and resolution. Please specify the observed behavior, expected outcome, and any relevant environment or...
specify alignment of A_local
``` class BitLinearInference(nn.Module): def __init__(self, in_features: int, out_features: int, ): super().__init__() self.in_f = in_features self.out_f = out_features self.register_buffer("w", torch.empty((out_features, in_features))) self.register_buffer("w_scale", torch.empty((1,), dtype=torch.float32)) self.norm = nn.RMSNorm( normalized_shape=in_features, eps=1e-5, elementwise_affine=True )...
# Description BitNet.cpp is nearly impossible to deploy on a low-ended arm device due to the low IO rate of the chipset. By introducing parallel compiling, this issue could be...