min_epochs and EarlyStopping in conflict
Bug description
I have a problem where I use min_epochs because it can take a while before the training starts to converge.
EarlyStopping is triggered quite early, but I thought to set min_epochs appropriately to 'get over' that initial period.
However, even though training is converging by the time we reach min_epochs, early stopping will stop training immediately once we reached min_epochs, just because it was triggered very early on in training.
I think that EarlyStopping should pick itself back up if we improve upon the monitored metric before reaching min_epochs.
Example Trainer config:
trainer = L.Trainer(
max_epochs=10000,
callbacks=[
EarlyStopping(monitor="val_loss", mode="min", patience=100),
]
min_epochs=1000,
)
Now imagine EarlyStopping triggering at epoch 100, but val_loss improving at 101 all the way until epoch 1000 - right now training will still stop.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
No response
Error messages and logs
No response
Environment
No response
More info
No response
I also see this and think the implementation would be better suited if, after min_epochs is reached, EarlyStopping takes precedence. As it stands right now, it is as if EarlyStopping does not exist because training exits once min_epochs is reached no matter what.
I think the code in this issue would solve it: EarlyStopping with warmup.
class EarlyStoppingWithWarmup(EarlyStopping):
"""
EarlyStopping, except don't watch the first `warmup` epochs.
"""
def __init__(self, warmup=10, **kwargs):
super().__init__(**kwargs)
self.warmup = warmup
def on_validation_end(self, trainer, pl_module):
if (
self._check_on_train_epoch_end
or self._should_skip_check(trainer)
or trainer.current_epoch < self.warmup
):
return
self._run_early_stopping_check(trainer)