ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

tutorial_train.py does not produce checkpoints

Open jamesWalker55 opened this issue 2 years ago • 2 comments

When running the tutorial_train.py script, it creates logs at lightning_logs. However, the lightning_logs\version_X\checkpoints folders are all empty:

image

Is this intended behaviour, or is there a bug preventing the checkpoints from saving?

jamesWalker55 avatar Mar 14 '23 00:03 jamesWalker55

There is no code to save checkpoints. You need to make a checkpoint callback

SwayStar123 avatar Mar 14 '23 17:03 SwayStar123

from pytorch_lightning.callbacks import ModelCheckpoint
checkpoint_callback = ModelCheckpoint(
                dirpath={your work dir},
                every_n_train_steps={after how many gradient descent do save one model weights file},
                save_weights_only=False
            )

and replace callbacks=[logger] with callbacks=[logger, checkpoint_callback] in your trainer = pl.Trainer(... code parameters.

cuz one epoch may be too long to wait for, you may want to save weights after some certain iterations. This code might help.

feiyangsuo avatar Mar 30 '23 12:03 feiyangsuo