BitNet-Transformers icon indicating copy to clipboard operation
BitNet-Transformers copied to clipboard

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

Results 9 BitNet-Transformers issues
Sort by recently updated
recently updated
newest added

Training may be on CPU, but deployment has to be on CPU for high scalability.

I just want to reproduce the paper results. Since the paper use only bitnet-transformer, so I wonder if I can replace FC with this BitLienar for Transformer.

Hello. First of all, thank you for sharing the code. I have one question about your work. I am wondering if you checked the accuracy after training was completed. When...

Hi, I have a doubt in your BitLinear.forward() implementation. The BitNet paper says the output should be the form as ; y = binarized_weight(W) @ AbsMaxQuant(LN(x)) * betta*gamma/Q_b (LN is...

i took the code for BitLinearOptimized and added a small thing so I can run it standalone ```python super(BitLinearOptimized, self).__init__(in_features, out_features, bias,dtype=torch.bfloat16) #just added the right dtype ``` runing the...

Huggingface -> Hugging Face

Just wondering when you were planning on implementing BitLinear layer to use 1-bit weights and custom cuda kernel for 1-bit weight? super thirsty for the code ha. Appreciate you.

I was testing in Colab and when I ran "model.model.layers[0].mlp.gate_proj.weight". I recieved very different results from yours. You got: Parameter containing: tensor([[ 0.0032, -0.0339, 0.0150, ..., 0.0041, -0.0048, 0.0061], [-0.0105,...