axlearn
axlearn copied to clipboard
An integration of orbax checkpointer.
Depends on https://github.com/apple/axlearn/pull/650.
The relevant changes are in checkpointer.py, checkpointer_test.py, and pyproject.toml. It also depends on an unreleased commit for orbax (for concurrency bounded serialization).
Thanks Mark! Adding @cpgaffney1 to take a look as well (this will be the PR that gets merged for Orbax fyi)