Davis Wertheimer
Davis Wertheimer
Current dataloader still causes gradual asymptotic slowdowns - likely because we have n_workers fixed to 0 in the dataloader. This forces the main process to also handle dataloading in a...
Add support for speculator training, piggybacking off the existing training utilities. Training script and speculator-specific utilities are inside the new `speculator` subfolder. Uses distributed setup, checkpointing, and dataloaders from this...
This PR introduces an experimental PyTorch-native dataloader from IBM that is distributed, stateful, checkpointable, composable and rescalable. It is intended for use in large-scale model pretraining, particularly in research settings...
Implement [muP scaling](https://arxiv.org/abs/2203.03466) for Llama models. Model follows muP scaling laws but introduces the minimal set of extra tunable hyperparameters that allows us to recover prior behavior - thus may...
Current code prints multiple warnings from each gpu at the start of training, which clutters up the log. Updates dataloader and process group constructors to eliminate these warnings, respectively: ```...
A collection of dataloader updates and fixes mirrored from the torchtitan repo. Changes include: FIXES FOR HANGS AND FREEZES: - Truncate long text docs to 1M characters - Allow LCG...
When the dataloader loads from checkpoint, it expects a path to the checkpoints directory, from which it pulls the most recent checkpoint folder and loads the relevant data. This is...
Adds support for FIM training (https://arxiv.org/pdf/2207.14255). Allows for SPM or PSM mode (or both) with `--fim_training` command arg. Passes unit tests but not yet tested with a small LLM. Will...
Implements rescaling of checkpoints to different world sizes and numbers of workers. User specifies in advance the number of data partitions, and when saving/loading checkpoints with different total workers, stateful...