checkpoints for further pretraining
Hi, congratulations on the awesome work.
I was planning on further pretrain your models in another languages, since your version focus mainly in English. However, I am having trouble starting a train ( from main.py) initializing the model weights with the ones from moderbert-base. Are the composer checkpoints available somewhere? or maybe it is possible to start a train in composer using the weights at hugginface in some way? I would greatly appreciate any guidance regarding that.
Hello,
We are planning to also release intermediate checkpoints (that could be more appropriate for your needs, especially the pre-decay ones) in early January (right now the team is resting and bit and scattered due to vacations).
The HF checkpoints are derived from the composer checkpoints through a composer conversion function (write_huggingface_pretrained_from_composer_checkpoint), but I am not sure this is properly doable the other way around (especially for training states). I will have a look as whether it is doable and, if not, also release the composer checkpoints one way or another!
Hi @NohTow, Thanks for amazing contribution... Any updates on this?
Hey,
Unfortunately no, as mentioned in the previous answer, the team took Christmas vacations after the release and most of us are still in vacation. But be assured that releasing all the checkpoints is planned and we'll make sure it is a priority when we come back. Sorry for the delay.
@NohTow , Thats perfectly fine... Thanks for the update..
Hey @NohTow, thanks for the great work. Is there an update on the timeline when the checkpoints will be released ? :)
Hello,
We just had a meeting where we discussed the different things to do to enable reproduction and further pre-training, which include releasing Composer checkpoints, configs, making sure everything run smoothly and adding proper documentation for people to run.
This should be done soon, I am once again sorry for the delay, we had a lot of things to do lately!
Hi, would be awesome if you could release (at least) the final checkpoints for the versions of the model already on huggingface so I could use the code for further finetuning!
We uploaded the Composer training checkpoints for both ModernBERT-base and ModernBERT-large to Hugging Face this week. Will add instructions on how to use them over the next few days.
I'm attempting to use these checkpoints as Composer training checkpoints, but Composer hangs when loading them.
My yaml file has: load_path: /workspace/checkpoints/modernbert_large_context_extension/ep0-ba49552-rank0.pt
I downloaded that checkpoint from the link above for ModernBERT-large.
Then I run $ composer main.py my-config.yaml
I set the logging level of the Composer Trainer to DEBUG, and I see it hangs here during instantiation of the Trainer:
2025-12-04 16:53:52,619: rank0[10034][MainThread]: INFO: composer.trainer.trainer: Stepping schedulers every batch. To step schedulers every epoch, set `step_schedulers_every_batch=False`.
2025-12-04 16:53:52,620: rank0[10034][MainThread]: DEBUG: composer.utils.checkpoint: Loading checkpoint at /workspace/checkpoints/modernbert_large_context_extension/ep0-ba49552-rank0.pt
I set the timeout to 1 hr, and the model loading never completes. Eventually, I get a timeout error.
Is this file in the wrong format from which to load a checkpoint in Composer?
My goal is simply to continue pre-training the ModernBERT-large model with a domain-specific corpus. I thought the recommended approach for doing this was using the codebase in this repo.
I figured out my issue. The checkpoints cited above are not suitable for the composer Trainer, even if one sets load_weights_only to True.
There are two distinct code paths for restoring from a previous checkpoint, with corresponding options that enable them in the yaml configs (init_from_checkpoint and load_path).
One must use the init_from_checkpoint mechanism, although this requires fixing a small bug in the code. (The bug is a "<=" operator that should be "<" in an assertion.)
I found it necessary to inspect the contents of the checkpoint and to read the code carefully. What little documentation is available here is just as likely to mislead as it is to help.
There are two distinct code paths for restoring from a previous checkpoint, with corresponding options that enable them in the yaml configs (init_from_checkpoint and load_path).
Hey,
Both options are for different things. init_from_checkpoint was created to init large from base with tilling
load_path should be usable without issue, your issue of iddling is probably due to spinning, see #246.
As your goal is to continue the training on another dataset, you do not have to do spinning. You can use
load_path: checkpoints/modernbert-base-context-extension/context-extension/ep0-ba52988-rank0.pt
autoresume: false
reset_time: true # restarts the scheduler, dataloaders, etc from step zero
restart_override: true # resets optimizer hyperparameters (LR, WD, etc), LR Scheduler, and training microbatch size from the checkpoint's values
This blogpost gives a lot of details on how to successfully continue the training using those checkpoints.