axlearn icon indicating copy to clipboard operation
axlearn copied to clipboard

Gradient Accumulation in Axlearn

Open apoorvtintin opened this issue 1 year ago • 0 comments

Gradient accumulation allows training with higher batch sizes without scaling out.

Added a new learner type learner.klass: 'axlearn.common.learner.AccumulatedLearner'

At a high level the optimization does the following:

  1. Input batch is split into even microbatches.
  2. Creates a buffer for gradients and metrics.
  3. Runs forward and backward pass for each microbatch in a loop summing up the gradients and aggregating metrics.
  4. Average gradients across microbatches and normalize metrics.

Configuration changes:

  • Number of microbatches are specified during configuration through optionaccumulation_microbatches in the trainer and micriobatches in the learner.

apoorvtintin avatar May 13 '24 20:05 apoorvtintin