Request for pretraining script
Appreciate your great work!
I would like to pre-train the model from scratch including the tokenizer. The run_mlm.py seems not to use the BPE tokenizer. Therefore, could you please share the exact pre-train script so that I can follow the same steps as you did for pretraining?
Thank you very much!
I also want to request the pretraining script
Same here! The pretraining script would be much appreciated!
Hi, do you have the full pre-training code yet?
I don't. But I've started trying to implement the pretraining using the instructions given in the Paper as well as MosaicBert and ModernBert.