Skip checkpointing at step=0
Description
- Skip checkpointing at step=0
- add abs for max numerical diff log in forward_pass_checker
Tests
Integration tests
Checklist
Before submitting this PR, please make sure (put X in square brackets):
- [x] I have performed a self-review of my code.
- [x] I have necessary comments in my code, particularly in hard-to-understand areas.
- [x] I have run end-to-end tests tests and provided workload links above if applicable.
- [x] I have made or will make corresponding changes to the doc if needed.
@xuefgu originally added the checkpoint at step 0, do you have strong opinions about removing it?
@xuefgu originally added the checkpoint at step 0, do you have strong opinions about removing it?
No objections. My ancient change, if memory serves, was only to avoid calling save_checkpoint when step was not a multiple of the interval.
Two nits on the PR though:
- We could place the
step != 0check in the front. - Line 136 could benefit from the same check.
This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.