sparseml
sparseml copied to clipboard
Saving torchvision checkpoints based on staged recipe phase
- Renames
BaseManager.phase->BaseManager.phase_at_end_of - Clarifies behavior
- Integrates saving checkpoints based on phases into torchvision
Test Plan
Ran the following recipe:
version: 1.1.0
training_modifiers:
- !EpochRangeModifier
start_epoch: 0
end_epoch: 15
- !SetLearningRateModifier
start_epoch: 0.0
learning_rate: 0.001
pruning_modifiers:
- !GMPruningModifier
init_sparsity: 0.05
final_sparsity: 0.85
start_epoch: 5.0
end_epoch: 10.0
update_frequency: 1.0
params: ["re:.*conv..weight*"]
quantization_modifiers:
- !QuantizationModifier
start_epoch: 11.0
freeze_bn_stats_epoch: 12.0
disable_quantization_observer_epoch: 13.0
Which generated the following directory after running:
- best_dense.pth (best_dense.txt contains epoch 3)
- best_pruned_quantized.pth (best_pruned_quantized.txt contains epoch 13)
- best_pruned.pth (best_pruned.txt contains epoch 10)
- last_dense.pth (.txt contains epoch 4)
- last_pruned_quantized.pth (.txt contains epoch 14)
- last_pruned.pth (.txt contains epoch 10)
- last.pth (.txt contains epoch 14)
And the following output:
sparseml.image_classification.train --recipe resnet18-pq.yaml --dataset-path ~/.cache/nm_datasets/imagenette/imagenette-320/ --arch-key resnet18 --output-dir ./runs
INFO:sparseml.pytorch.torchvision.train:Finished epoch 0 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 1 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 2 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 3 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 4 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 5 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 6 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 7 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 8 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 9 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 10 in phase pruned
INFO:sparseml.pytorch.torchvision.train:Finished epoch 11 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 12 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 13 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 14 in phase pruned_quantized
Noting that the following transitions are correct:
- 0-4 are in dense
- start epoch for pruning is 5, so at the END of epoch 5, pruning is in progress -> phase is None
- end epoch for pruning is 10, so at the END of epoch 10, pruning is complete -> phase is pruned
- start epoch for quantization is 11, so at the END of epoch 11, quantization is complete -> phase is pruned_quantized
LGTM, just curious, what was the motivation for the change?
This method of saving was decided for the standardization of integrations. It also makes it more clear when a checkpoint is dense/pruned/quantized. Previously best.pt could contain any of the versions - notably it could still be a dense model