BitNet training code

I see the inference code - Can you please share the training code?

Apr 21 '25 05:04 ehartford

You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16

Apr 24 '25 08:04 sd983527

You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16

Hi, I am interested in post-trainning this model. May I ask some questions?

Is it the same or highly similar as trainning other models like llama or qwen? or would you like to share sample trainning code?

and one thing i notice is that the orignial huggingface mentioned:

Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.

is there any open-source training code relating to this ?

Apr 24 '25 12:04 hbj52152

Yes I understand which model to use for post training.

Me, I want to do continued pretraining. Large scale. 20T tokens maybe.

Could you please share the pretraining code?

My intent is to scale it up.

Apr 24 '25 18:04 ehartford

I am trying SFT for my downstream task. I think Trainer from trl may work.

Apr 28 '25 03:04 LiuZhihhxx

I am trying SFT for my downstream task. I think Trainer from trl may work.

after sft, the model weight should be fp16. how do you get the b1.58 weights ?

May 20 '25 09:05 yuimo