training code
I see the inference code - Can you please share the training code?
You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
You can use this model for post training. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
Hi, I am interested in post-trainning this model. May I ask some questions?
Is it the same or highly similar as trainning other models like llama or qwen? or would you like to share sample trainning code?
and one thing i notice is that the orignial huggingface mentioned:
Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.
is there any open-source training code relating to this ?
Yes I understand which model to use for post training.
Me, I want to do continued pretraining. Large scale. 20T tokens maybe.
Could you please share the pretraining code?
My intent is to scale it up.
I am trying SFT for my downstream task. I think Trainer from trl may work.
I am trying SFT for my downstream task. I think
Trainerfromtrlmay work.
after sft, the model weight should be fp16. how do you get the b1.58 weights ?