[Feat]: The current epoch's validation is skipped when continuing from last backup

Open DaStapo opened this issue 7 months ago • 1 comments

What happened?

Every time I resumed training with Continue from last backup it skipped the validation at the end of current epoch (it does do the validation in the epoch after though). It prints out

validation_step: 0%| | 0/20 [00:00<?, ?it/s]

but just skips and continues training. Meaning if I were to pause and resume training in the middle of every epoch, it would never calculate the validation loss (except at the very start). In case it matters - I only tried it while training a LoRA for Flux Fill.

What did you expect would happen?

Surely it's expected that validation loss gets calculated at the end of each epoch regardless of resuming a training run.

Relevant log output

Generate and upload debug_report.log

No response

Jul 06 '25 07:07 DaStapo

True, but how is this a problem? Changing to enhancement

Sep 07 '25 19:09 dxqb