ART
ART copied to clipboard
Add adjust_lr function for learning rate schedules
Summary
- Added
adjust_lrfunction toiterate_dataset.pyto support learning rate schedules with warmup and cooldown phases - Updated
DatasetBatchto includetotal_stepsfield needed for LR calculations - Simplified API design: constant LR by default, linear decay achievable via cooldown_length
Implementation Details
The adjust_lr function supports:
-
warmup_length: Linear ramp from 0 to base LR (can be int for steps or float for ratio) -
cooldown_length: Linear decay from base LR to 0 (can be int for steps or float for ratio) - No explicit schedule type needed - linear schedule over entire training achievable by setting cooldown_length=1.0
Status
This is a DRAFT PR - we'll make it final once we've had a chance to test it on some real runs.
Context
We've had good success with a constant learning rate in our experiments, but there may be some benefit to warmup and cooldown phases that we need to investigate through empirical testing.
🤖 Generated with Claude Code
Added support for negative cooldown_length values as discussed:
- Negative values now specify the exact step where cooldown starts (e.g.,
cooldown_length=-20means cooldown begins at step 20) - This makes it easy to implement linear decay after warmup:
warmup_length=20, cooldown_length=-20 - The implementation ensures cooldown always starts after warmup completes
Example usage for your case:
lr = adjust_lr(batch, learning_rate=1e-4, warmup_length=20, cooldown_length=-20)
This will:
- Warmup from 0 to 1e-4 over the first 20 steps
- Start cooldown at step 20 and linearly decay to 0 by the end of training