Missing optimizer step

Open FlorisFok opened this issue 4 years ago • 2 comments

If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at if (step + 1) % gradient_accumulation_steps == 0:

Jun 04 '21 13:06 FlorisFok

Hi @FlorisFok, do you have suggestions as to how this should be fixed?

Jun 10 '21 17:06 timoschick

Hi @timoschick, by adding an OR statement to the gradient accumulation if statement. This OR statement could also execute when the loop reaches the final batch.

last_batch = len(train_dataloader) - 1

The modify the following: if (step + 1) % gradient_accumulation_steps == 0 or last_batch == b_nr:

Where b_nr (batch_number) can be extracted from the first argument coming from the enumerate function. Theoretically, this should use the step variable already in the script, but this behaves exactly the same as the global_step. I think that's also a mistake, but that depends on the definition of the two.

Jun 21 '21 11:06 FlorisFok