ART icon indicating copy to clipboard operation
ART copied to clipboard

Add adjust_lr function for learning rate schedules

Open corbt opened this issue 6 months ago • 1 comments

Summary

  • Added adjust_lr function to iterate_dataset.py to support learning rate schedules with warmup and cooldown phases
  • Updated DatasetBatch to include total_steps field needed for LR calculations
  • Simplified API design: constant LR by default, linear decay achievable via cooldown_length

Implementation Details

The adjust_lr function supports:

  • warmup_length: Linear ramp from 0 to base LR (can be int for steps or float for ratio)
  • cooldown_length: Linear decay from base LR to 0 (can be int for steps or float for ratio)
  • No explicit schedule type needed - linear schedule over entire training achievable by setting cooldown_length=1.0

Status

This is a DRAFT PR - we'll make it final once we've had a chance to test it on some real runs.

Context

We've had good success with a constant learning rate in our experiments, but there may be some benefit to warmup and cooldown phases that we need to investigate through empirical testing.

🤖 Generated with Claude Code

corbt avatar Jul 16 '25 00:07 corbt

Added support for negative cooldown_length values as discussed:

  • Negative values now specify the exact step where cooldown starts (e.g., cooldown_length=-20 means cooldown begins at step 20)
  • This makes it easy to implement linear decay after warmup: warmup_length=20, cooldown_length=-20
  • The implementation ensures cooldown always starts after warmup completes

Example usage for your case:

lr = adjust_lr(batch, learning_rate=1e-4, warmup_length=20, cooldown_length=-20)

This will:

  1. Warmup from 0 to 1e-4 over the first 20 steps
  2. Start cooldown at step 20 and linearly decay to 0 by the end of training

corbt avatar Jul 16 '25 00:07 corbt