Epoch is implicitly incremented if terminated on iteration
🐛 Bug description
Below code shows the error:
from ignite.engine import Engine, Events
from ignite.utils import setup_logger
stop_iter = 50
epoch_length = 100
max_epochs = 2
trainer = Engine(lambda e, b: print(b, end=" "))
trainer.logger = setup_logger("trainer")
state = trainer.state
@trainer.on(Events.ITERATION_COMPLETED(every=stop_iter))
def stop():
print("--> stop at {}".format(trainer.state.iteration))
trainer.terminate()
data = list(range(epoch_length))
print("- Start from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
The issue is that iteration and epoch start to be unrelated which is a bug.
Environment
- PyTorch Version (e.g., 1.4):
- Ignite Version (e.g., 0.3.0):
- OS (e.g., Linux):
- How you installed Ignite (
conda,pip, source): - Python version:
- Any other relevant information:
@vfdev-5 is this solved, what is the error .... I got this while running
╭─ debo@pop-os ~ 34.66s 5G 1.30 17:33:26
╰─ python test.py
- Start from 0 iteration
2021-02-16 17:33:38,589 trainer INFO: Engine run starting with max_epochs=2.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 50
2021-02-16 17:33:38,591 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,591 trainer INFO: Epoch[1] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,591 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 50 iteration | 1 epoch
-- Do something else
- Continue from 50 iteration
2021-02-16 17:33:38,591 trainer INFO: Engine run resuming from iteration 50, epoch 1 until 2 epochs
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 100
2021-02-16 17:33:38,593 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,593 trainer INFO: Epoch[2] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
╭─ debo@pop-os ~ 8.17s 5G 1.32 17:33:39
╰─
@sparkingdark there is no explicit error raised here, but epoch value is wrong. Here is a snippet with more explicit epoch check:
from ignite.engine import Engine, Events
from ignite.utils import setup_logger
stop_iter = 2
epoch_length = 15
max_epochs = 5
trainer = Engine(lambda e, b: print(b, end=" "))
trainer.logger = setup_logger("trainer")
state = trainer.state
@trainer.on(Events.ITERATION_COMPLETED(every=stop_iter))
def stop():
print("--> stop at {}".format(trainer.state.iteration))
trainer.terminate()
data = list(range(epoch_length))
print("- Start from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
assert state.epoch == 1, state.epoch
Also, note that we do not continue iterating the data but restart from the first samples which is wrong as well.
Okay somehow need a fix which can resume from the current value. am i correct ?
Well, this is a bit complicated to fix as is. I think this will be done with Engine refactor that I'm initiated some time ago...
Okay so am I try to solve it or look into other issues @vfdev-5
I'd suggest to see other "help wanted" issues: https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22