Apoorv Gupta

Results 6 issues of Apoorv Gupta

*Issue #, if available:* *Description of changes:* Updated SMDDP MNIST training example with new APIs and information *Testing done:* Yes, tested on Sagemaker ## Merge Checklist _Put an `x` in...

Gradient accumulation allows training with higher batch sizes without scaling out. Added a new learner type ```learner.klass: 'axlearn.common.learner.AccumulatedLearner'``` At a high level the optimization does the following: 1. Input batch...

This PR enables use of neuron devices in Axlearn for model training. - Chooses correct mesh for TRN devices for Fuji 7B with the mesh selector flag ```--mesh_selector=neuron-trn1.32xlarge-64```

Increases memory efficiency during large scale training, input batches and labels are sharded along the 'data' axis. Added new input data sharding option ```DataPartitionType.DATA```.

Saving out-projection improves training throughput while still fitting in the mesh defined by `neuron-(trn2|trn2n).48xlarge-64`.

Allow fallback to standard mesh for multi-granule mesh as such a mesh provides better performance on TRN2 - Added corresponding tests for fallback and mesh creation for TRN2. - Switch...