DNABERT_2 icon indicating copy to clipboard operation
DNABERT_2 copied to clipboard

Request for pretraining script

Open AstroSign opened this issue 1 year ago • 4 comments

Appreciate your great work!

I would like to pre-train the model from scratch including the tokenizer. The run_mlm.py seems not to use the BPE tokenizer. Therefore, could you please share the exact pre-train script so that I can follow the same steps as you did for pretraining?

Thank you very much!

AstroSign avatar Feb 06 '25 21:02 AstroSign

I also want to request the pretraining script

Lucas9909 avatar Mar 06 '25 19:03 Lucas9909

Same here! The pretraining script would be much appreciated!

buschjo avatar Mar 10 '25 15:03 buschjo

Hi, do you have the full pre-training code yet?

ychuest avatar Mar 13 '25 15:03 ychuest

I don't. But I've started trying to implement the pretraining using the instructions given in the Paper as well as MosaicBert and ModernBert.

buschjo avatar Mar 13 '25 16:03 buschjo