axlearn issues

chore(axlearn): Fix typos

4

# Changes ## Typo fixes Tool-assisted (via ` typos --format brief --write-changes **/*.py` via [typos-cli](https://github.com/crate-ci/typos)). The rest of the effort is fine-tooth combing that output. Aside: If you attempt this...

tony

Add Dataflow Inference Examples

This PR adds 2 examples for running batch inference on Dataflow: 1. Using a Custom Model Handler for JAX models 2. Using a Built-in HuggingFace Model Handler These pipelines can...

jiya-zhang

Heap profiler testing

ruomingp

Te flash attention

For Debugging purpose.

kelvin-zou

Add new model config for smaller tests

1

Adding a new model configuration for text experiments. The goal is to get an early termination model for fuji-test to accelerate infrastructure validation. + @jiya-zhang

jesus-orozco

Gradient Accumulation in Axlearn

Gradient accumulation allows training with higher batch sizes without scaling out. Added a new learner type ```learner.klass: 'axlearn.common.learner.AccumulatedLearner'``` At a high level the optimization does the following: 1. Input batch...

apoorvtintin

Dataflow changes

2

I changed a few things in this PR: 1. Add Dockerfile entrypoint for Dataflow (this is needed for Dataflow worker to start up successfully) 2. Mount gcloud config folder to...

jiya-zhang