BitNet
BitNet copied to clipboard
FineTune the 1.58b
I want to continue pretraining the 1.58b 2B model to add more on my language. Or finetune for specific knowledge.
Are there any base code i could start with to train for 1.58b. I've read the paper, and its used the unusual method to measure the gradient of the ternary parameter.