ignite icon indicating copy to clipboard operation
ignite copied to clipboard

Epoch is implicitly incremented if terminated on iteration

Open vfdev-5 opened this issue 5 years ago • 6 comments

🐛 Bug description

Below code shows the error:

from ignite.engine import Engine, Events
from ignite.utils import setup_logger

stop_iter = 50
epoch_length = 100
max_epochs = 2

trainer = Engine(lambda e, b: print(b, end=" "))
trainer.logger = setup_logger("trainer")
state = trainer.state

@trainer.on(Events.ITERATION_COMPLETED(every=stop_iter))
def stop():
    print("--> stop at {}".format(trainer.state.iteration))
    trainer.terminate()

data = list(range(epoch_length))

print("- Start from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

The issue is that iteration and epoch start to be unrelated which is a bug.

Environment

  • PyTorch Version (e.g., 1.4):
  • Ignite Version (e.g., 0.3.0):
  • OS (e.g., Linux):
  • How you installed Ignite (conda, pip, source):
  • Python version:
  • Any other relevant information:

vfdev-5 avatar Oct 14 '20 09:10 vfdev-5

@vfdev-5 is this solved, what is the error .... I got this while running

╭─   debo@pop-os    ~        34.66s    5G    1.30    17:33:26   
╰─ python test.py             
- Start from 0 iteration
2021-02-16 17:33:38,589 trainer INFO: Engine run starting with max_epochs=2.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 50
2021-02-16 17:33:38,591 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,591 trainer INFO: Epoch[1] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,591 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 50 iteration | 1 epoch
-- Do something else
- Continue from 50 iteration
2021-02-16 17:33:38,591 trainer INFO: Engine run resuming from iteration 50, epoch 1 until 2 epochs
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 100
2021-02-16 17:33:38,593 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,593 trainer INFO: Epoch[2] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
╭─   debo@pop-os    ~         8.17s    5G    1.32    17:33:39   
╰─ 

sparkingdark avatar Feb 16 '21 12:02 sparkingdark

@sparkingdark there is no explicit error raised here, but epoch value is wrong. Here is a snippet with more explicit epoch check:

from ignite.engine import Engine, Events
from ignite.utils import setup_logger

stop_iter = 2
epoch_length = 15
max_epochs = 5

trainer = Engine(lambda e, b: print(b, end=" "))
trainer.logger = setup_logger("trainer")
state = trainer.state

@trainer.on(Events.ITERATION_COMPLETED(every=stop_iter))
def stop():
    print("--> stop at {}".format(trainer.state.iteration))
    trainer.terminate()

data = list(range(epoch_length))

print("- Start from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))

print("-- Do something else")

assert state.epoch == 1, state.epoch

Also, note that we do not continue iterating the data but restart from the first samples which is wrong as well.

vfdev-5 avatar Feb 16 '21 12:02 vfdev-5

Okay somehow need a fix which can resume from the current value. am i correct ?

sparkingdark avatar Feb 16 '21 13:02 sparkingdark

Well, this is a bit complicated to fix as is. I think this will be done with Engine refactor that I'm initiated some time ago...

vfdev-5 avatar Feb 16 '21 13:02 vfdev-5

Okay so am I try to solve it or look into other issues @vfdev-5

sparkingdark avatar Feb 18 '21 05:02 sparkingdark

I'd suggest to see other "help wanted" issues: https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22

vfdev-5 avatar Feb 18 '21 08:02 vfdev-5