BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

training code

Open ehartford opened this issue 9 months ago • 5 comments

I see the inference code - Can you please share the training code?

ehartford avatar Apr 21 '25 05:04 ehartford

You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16

sd983527 avatar Apr 24 '25 08:04 sd983527

You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16

Hi, I am interested in post-trainning this model. May I ask some questions?

Is it the same or highly similar as trainning other models like llama or qwen? or would you like to share sample trainning code?

and one thing i notice is that the orignial huggingface mentioned:

Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.

is there any open-source training code relating to this ?

hbj52152 avatar Apr 24 '25 12:04 hbj52152

Yes I understand which model to use for post training.

Me, I want to do continued pretraining. Large scale. 20T tokens maybe.

Could you please share the pretraining code?

My intent is to scale it up.

ehartford avatar Apr 24 '25 18:04 ehartford

I am trying SFT for my downstream task. I think Trainer from trl may work.

LiuZhihhxx avatar Apr 28 '25 03:04 LiuZhihhxx

I am trying SFT for my downstream task. I think Trainer from trl may work.

after sft, the model weight should be fp16. how do you get the b1.58 weights ?

yuimo avatar May 20 '25 09:05 yuimo