Apoorv Gupta
Apoorv Gupta
*Issue #, if available:* *Description of changes:* Updated SMDDP MNIST training example with new APIs and information *Testing done:* Yes, tested on Sagemaker ## Merge Checklist _Put an `x` in...
Gradient accumulation allows training with higher batch sizes without scaling out. Added a new learner type ```learner.klass: 'axlearn.common.learner.AccumulatedLearner'``` At a high level the optimization does the following: 1. Input batch...
This PR enables use of neuron devices in Axlearn for model training. - Chooses correct mesh for TRN devices for Fuji 7B with the mesh selector flag ```--mesh_selector=neuron-trn1.32xlarge-64```
Increases memory efficiency during large scale training, input batches and labels are sharded along the 'data' axis. Added new input data sharding option ```DataPartitionType.DATA```.
Saving out-projection improves training throughput while still fitting in the mesh defined by `neuron-(trn2|trn2n).48xlarge-64`.
Allow fallback to standard mesh for multi-granule mesh as such a mesh provides better performance on TRN2 - Added corresponding tests for fallback and mesh creation for TRN2. - Switch...