chore(deps): update dependency accelerate to v0.34.2

Open renovate[bot] opened this issue 2 years ago • 1 comments

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
accelerate	`==0.24.1` -> `==0.34.2`

Release Notes

huggingface/accelerate (accelerate)

`v0.34.2`

Compare Source

`v0.34.1`: Patchfix

Compare Source

Bug fixes

Fixes an issue where processed DataLoaders could no longer be pickled in #3074 thanks to @byi8220
Fixes an issue when using FSDP where default_transformers_cls_names_to_wrap would separate _no_split_modules by characters instead of keeping it as a list of layer names in #3075

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.34.0...v0.34.1

`v0.34.0`: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!

Compare Source

Dependency Changes

Updated Safetensors Requirement: The library now requires safetensors version 0.4.3.
Added support for Numpy 2.0: The library now fully supports numpy 2.0.0

Core

New Script Behavior Changes

Process Group Management: PyTorch now requires users to destroy process groups after training. The accelerate library will handle this automatically with accelerator.end_training(), or you can do it manually using PartialState().destroy_process_group().
MLU Device Support: Added support for saving and loading RNG states on MLU devices by @huismiling
NPU Support: Corrected backend and distributed settings when using transfer_to_npu, ensuring better performance and compatibility.

DataLoader Enhancements

Stateful DataDataLoader: We are excited to announce that early support has been added for the StatefulDataLoader from torchdata, allowing better handling of data loading states. Enable by passing use_stateful_dataloader=True to the DataLoaderConfiguration, and when calling load_state() the DataLoader will automatically be resumed from its last step, no more having to iterate through passed batches.
Decoupled Data Loader Preparation: The prepare_data_loader() function is now independent of the Accelerator, giving you more flexibility towards which API levels you would like to use.
XLA Compatibility: Added support for skipping initial batches when using XLA.
Improved State Management: Bug fixes and enhancements for saving/loading DataLoader states, ensuring smoother training sessions.
Epoch Setting: Introduced the set_epoch function for MpDeviceLoaderWrapper.

FP8 Training Improvements

Enhanced FP8 Training: Fully Sharded Data Parallelism (FSDP) and DeepSpeed support now work seamlessly with TransformerEngine FP8 training, including better defaults for the quantized FP8 weights.
Integration baseline: We've added a new suite of examples and benchmarks to ensure that our TransformerEngine integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with raw TransformersEngine, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them here
Import Fixes: Resolved issues with import checks for the Transformers Engine that has downstream issues.
FP8 Docker Images: We've added new docker images for TransformerEngine and accelerate as well. Use docker pull huggingface/accelerate@gpu-fp8-transformerengine to quickly get an environment going.

`torchpippy` no more, long live `torch.distributed.pipelining`

With the latest PyTorch release, torchpippy is now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on
There are breaking examples and changes that comes from this shift. Namely:
- Tracing of inputs is done with a shape each GPU will see, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of [1, n, n] rather than [2, n, n] as before.
- We no longer support Encoder/Decoder models. PyTorch tracing for pipelining no longer supports encoder/decoder models, so the t5 example has been removed.
- Computer vision model support currently does not work: There are some tracing issues regarding resnet's we are actively looking into.
If either of these changes are too breaking, we recommend pinning your accelerate version. If the encoder/decoder model support is actively blocking your inference using pippy, please open an issue and let us know. We can look towards adding in the old support for torchpippy potentially if needed.

Fully Sharded Data Parallelism (FSDP)

Environment Flexibility: Environment variables are now fully optional for FSDP, simplifying configuration. You can now fully create a FullyShardedDataParallelPlugin yourself manually with no need for environment patching:

from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)

FSDP RAM efficient loading: Added a utility to enable RAM-efficient model loading (by setting the proper environmental variable). This is generally needed if not using accelerate launch and need to ensure the env variables are setup properly for model loading:

from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
enable_fsdp_ram_efficient_loading()

Model State Dict Management: Enhanced support for unwrapping model state dicts in FSDP, making it easier to manage distributed models.

New Examples

Configuration and Models: Improved configuration handling and introduced a configuration zoo for easier experimentation. You can learn more here. This was largely inspired by the axolotl library, so very big kudos to their wonderful work
FSDP + SLURM Example: Added a minimal configuration example for running jobs with SLURM and using FSDP

Detailed Full Changelog:

https://github.com/huggingface/accelerate/compare/v0.33.0...v0.34.0

`v0.33.0`: : MUSA backend support and bugfixes

Compare Source

MUSA backend support and bugfixes

Small release this month, with key focuses on some added support for backends and bugs:

Support MUSA (Moore Threads GPU) backend in accelerate by @fmo-mt in https://github.com/huggingface/accelerate/pull/2917
Allow multiple process per device by @cifkao in https://github.com/huggingface/accelerate/pull/2916
Add torch.float8_e4m3fn format dtype_byte_size by @SunMarc in https://github.com/huggingface/accelerate/pull/2945
Properly handle Params4bit in set_module_tensor_to_device by @matthewdouglas in https://github.com/huggingface/accelerate/pull/2934

What's Changed

[tests] fix bug in torch_device by @faaany in https://github.com/huggingface/accelerate/pull/2909
Fix slowdown on init with device_map="auto" by @muellerzr in https://github.com/huggingface/accelerate/pull/2914
fix: bug where multi_gpu was being set and warning being printed even with num_processes=1 by @HarikrishnanBalagopal in https://github.com/huggingface/accelerate/pull/2921
Better error when a bad directory is given for weight merging by @muellerzr in https://github.com/huggingface/accelerate/pull/2852
add xpu device check before moving tensor directly to xpu device by @faaany in https://github.com/huggingface/accelerate/pull/2928
Add huggingface_hub version to setup.py by @nullquant in https://github.com/huggingface/accelerate/pull/2932
Correct loading of models with shared tensors when using accelerator.load_state() by @jkuntzer in https://github.com/huggingface/accelerate/pull/2875
Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix by @yhna940 in https://github.com/huggingface/accelerate/pull/2889
Fix import test by @muellerzr in https://github.com/huggingface/accelerate/pull/2931
Consider pynvml available when installed through the nvidia-ml-py distribution by @matthewdouglas in https://github.com/huggingface/accelerate/pull/2936
Improve test reliability for Accelerator.free_memory() by @matthewdouglas in https://github.com/huggingface/accelerate/pull/2935
delete CCL env var setting by @Liangliang-Ma in https://github.com/huggingface/accelerate/pull/2927
feat(ci): add pip caching in CI by @SauravMaheshkar in https://github.com/huggingface/accelerate/pull/2952

New Contributors

@HarikrishnanBalagopal made their first contribution in https://github.com/huggingface/accelerate/pull/2921
@fmo-mt made their first contribution in https://github.com/huggingface/accelerate/pull/2917
@nullquant made their first contribution in https://github.com/huggingface/accelerate/pull/2932
@cifkao made their first contribution in https://github.com/huggingface/accelerate/pull/2916
@jkuntzer made their first contribution in https://github.com/huggingface/accelerate/pull/2875
@matthewdouglas made their first contribution in https://github.com/huggingface/accelerate/pull/2936
@Liangliang-Ma made their first contribution in https://github.com/huggingface/accelerate/pull/2927
@SauravMaheshkar made their first contribution in https://github.com/huggingface/accelerate/pull/2952

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.32.1...v0.33.0

`v0.32.1`

Compare Source

`v0.32.0`: : Profilers, new hooks, speedups, and more!

Compare Source

Core

Utilize shard saving from the huggingface_hub rather than our own implementation (https://github.com/huggingface/accelerate/pull/2795)
Refactor logging to use logger in dispatch_model (https://github.com/huggingface/accelerate/pull/2855)
The Accelerator.step number is now restored when using save_state and load_state (https://github.com/huggingface/accelerate/pull/2765)
A new profiler has been added allowing users to collect performance metrics during model training and inference, including detailed analysis of execution time and memory consumption. These can then be generated in Chrome's tracing tool. Read more about it here (https://github.com/huggingface/accelerate/pull/2883)
Reduced import times for doing import accelerate and any other major core import by 68%, now should be only slightly longer than doing import torch (https://github.com/huggingface/accelerate/pull/2845)
Fixed a bug in get_backend and added a clear_device_cache utility (https://github.com/huggingface/accelerate/pull/2857)

Distributed Data Parallelism

Introduce DDP communication hooks to have more flexibility in how gradients are communicated across workers, overriding the standard allreduce. (https://github.com/huggingface/accelerate/pull/2841)
Make log_line_prefix_template optional the notebook_launcher (https://github.com/huggingface/accelerate/pull/2888)

FSDP

If the output directory doesn't exist when using accelerate merge-weights, one will be automatically created (https://github.com/huggingface/accelerate/pull/2854)
When merging weights, the default is now .safetensors (https://github.com/huggingface/accelerate/pull/2853)

XPU

Migrate to pytorch's native XPU backend on torch>=2.4 (https://github.com/huggingface/accelerate/pull/2825)
Add @require_triton test decorator and enable test_dynamo work on xpu (https://github.com/huggingface/accelerate/pull/2878)
Fixed load_state_dict not working on xpu and refine xpu safetensors version check (https://github.com/huggingface/accelerate/pull/2879)

XLA

Added support for XLA Dynamo backends for both training and inference (https://github.com/huggingface/accelerate/pull/2892)

Examples

Added a new multi-cpu SLURM example using accelerate launch (https://github.com/huggingface/accelerate/pull/2902)

Full Changelog

Use shard saving from huggingface_hub by @SunMarc in https://github.com/huggingface/accelerate/pull/2795
doc: fix link by @imba-tjd in https://github.com/huggingface/accelerate/pull/2844
Revert "Slight rename" by @SunMarc in https://github.com/huggingface/accelerate/pull/2850
remove warning hook addede during dispatch_model by @SunMarc in https://github.com/huggingface/accelerate/pull/2843
Remove underlines between badges by @novialriptide in https://github.com/huggingface/accelerate/pull/2851
Auto create dir when merging FSDP weights by @helloworld1 in https://github.com/huggingface/accelerate/pull/2854
Add DDP Communication Hooks by @yhna940 in https://github.com/huggingface/accelerate/pull/2841
Refactor logging to use logger in dispatch_model by @panjd123 in https://github.com/huggingface/accelerate/pull/2855
xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in https://github.com/huggingface/accelerate/pull/2825
Drop torch re-imports in npu and mlu paths by @dvrogozh in https://github.com/huggingface/accelerate/pull/2856
Default FSDP weights merge to safetensors by @helloworld1 in https://github.com/huggingface/accelerate/pull/2853
[tests] fix bug in test_tracking.ClearMLTest by @faaany in https://github.com/huggingface/accelerate/pull/2863
[tests] use torch_device instead of 0 for device check by @faaany in https://github.com/huggingface/accelerate/pull/2861
[tests] skip bnb-related tests instead of failing on xpu by @faaany in https://github.com/huggingface/accelerate/pull/2860
Potentially fix tests by @muellerzr in https://github.com/huggingface/accelerate/pull/2862
[tests] enable XPU backend for test_zero3_integration by @faaany in https://github.com/huggingface/accelerate/pull/2864
Support saving and loading of step while saving and loading state by @bipinKrishnan in https://github.com/huggingface/accelerate/pull/2765
Add Profiler Support for Performance Analysis by @yhna940 in https://github.com/huggingface/accelerate/pull/2883
Speed up imports and add a CI by @muellerzr in https://github.com/huggingface/accelerate/pull/2845
Make log_line_prefix_template Optional in Elastic Launcher for Backward Compatibility by @yhna940 in https://github.com/huggingface/accelerate/pull/2888
Add XLA Dynamo backends for training and inference by @johnsutor in https://github.com/huggingface/accelerate/pull/2892
Added a MultiCPU SLURM example using Accelerate Launch and MPIRun by @okhleif-IL in https://github.com/huggingface/accelerate/pull/2902
make more cuda-only tests device-agnostic by @faaany in https://github.com/huggingface/accelerate/pull/2876
fix mlu device longTensor bugs by @huismiling in https://github.com/huggingface/accelerate/pull/2887
add require_triton and enable test_dynamo work on xpu by @faaany in https://github.com/huggingface/accelerate/pull/2878
fix load_state_dict for xpu and refine xpu safetensor version check by @faaany in https://github.com/huggingface/accelerate/pull/2879
Fix get_backend bug and add clear_device_cache function by @NurmaU in https://github.com/huggingface/accelerate/pull/2857

New Contributors

@McPatate made their first contribution in https://github.com/huggingface/accelerate/pull/2836
@imba-tjd made their first contribution in https://github.com/huggingface/accelerate/pull/2844
@novialriptide made their first contribution in https://github.com/huggingface/accelerate/pull/2851
@panjd123 made their first contribution in https://github.com/huggingface/accelerate/pull/2855
@dvrogozh made their first contribution in https://github.com/huggingface/accelerate/pull/2825
@johnsutor made their first contribution in https://github.com/huggingface/accelerate/pull/2892
@okhleif-IL made their first contribution in https://github.com/huggingface/accelerate/pull/2902
@NurmaU made their first contribution in https://github.com/huggingface/accelerate/pull/2857

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.31.0...v0.32.0

`v0.31.0`: : Better support for sharded state dict with FSDP and Bugfixes

Compare Source

Core

Set timeout default to PyTorch defaults based on backend by @muellerzr in https://github.com/huggingface/accelerate/pull/2758
fix duplicate elements in split_between_processes by @hkunzhe in https://github.com/huggingface/accelerate/pull/2781
Add Elastic Launch Support to notebook_launcher by @yhna940 in https://github.com/huggingface/accelerate/pull/2788
Fix Wrong use of sync_gradients used to implement sync_each_batch by @fabianlim in https://github.com/huggingface/accelerate/pull/2790

FSDP

Introduce shard-merging util for FSDP by @muellerzr in https://github.com/huggingface/accelerate/pull/2772
Enable sharded state dict + offload to cpu resume by @muellerzr in https://github.com/huggingface/accelerate/pull/2762
Enable config for fsdp activation checkpointing by @helloworld1 in https://github.com/huggingface/accelerate/pull/2779

Megatron

Upgrade huggingface's megatron to nvidia's megatron when use MegatronLMPlugin by @zhangsheng377 in https://github.com/huggingface/accelerate/pull/2501

What's Changed

Add feature to allow redirecting std streams into log files when using torchrun as the launcher. by @lyuwen in https://github.com/huggingface/accelerate/pull/2740
Update modeling.py by adding try-catch section to skip the unavailable devices by @MeVeryHandsome in https://github.com/huggingface/accelerate/pull/2681
Fixed the problem of incorrect conditional judgment statement when configuring enable_cpu_affinity by @statelesshz in https://github.com/huggingface/accelerate/pull/2748
Fix stacklevel in logging to log the actual user call site (instead of the call site inside the logger wrapper) of log functions by @luowyang in https://github.com/huggingface/accelerate/pull/2730
LOMO / FIX: Support multiple optimizers by @younesbelkada in https://github.com/huggingface/accelerate/pull/2745
Fix max_memory assignment by @SunMarc in https://github.com/huggingface/accelerate/pull/2751
Fix duplicate environment variable check in multi-cpu condition by @yhna940 in https://github.com/huggingface/accelerate/pull/2752
Simplify CLI args validation and ensure CLI args take precedence over config file. by @Iain-S in https://github.com/huggingface/accelerate/pull/2757
Fix sagemaker config by @muellerzr in https://github.com/huggingface/accelerate/pull/2753
fix cpu omp num threads set by @jiqing-feng in https://github.com/huggingface/accelerate/pull/2755
Revert "Simplify CLI args validation and ensure CLI args take precedence over config file." by @muellerzr in https://github.com/huggingface/accelerate/pull/2763
Enable sharded cpu resume by @muellerzr in https://github.com/huggingface/accelerate/pull/2762
Sets default to PyTorch defaults based on backend by @muellerzr in https://github.com/huggingface/accelerate/pull/2758
optimize get_module_leaves speed by @BBuf in https://github.com/huggingface/accelerate/pull/2756
fix minor typo by @TemryL in https://github.com/huggingface/accelerate/pull/2767
Fix small edge case in get_module_leaves by @SunMarc in https://github.com/huggingface/accelerate/pull/2774
Skip deepspeed test by @SunMarc in https://github.com/huggingface/accelerate/pull/2776
Enable config for fsdp activation checkpointing by @helloworld1 in https://github.com/huggingface/accelerate/pull/2779
Add arg from CLI to fix failing test by @muellerzr in https://github.com/huggingface/accelerate/pull/2783
Skip tied weights disk offload test by @SunMarc in https://github.com/huggingface/accelerate/pull/2782
Introduce shard-merging util for FSDP by @muellerzr in https://github.com/huggingface/accelerate/pull/2772
FIX / FSDP : Guard fsdp utils for earlier PyTorch versions by @younesbelkada in https://github.com/huggingface/accelerate/pull/2794
Upgrade huggingface's megatron to nvidia's megatron when use MegatronLMPlugin by @zhangsheng377 in https://github.com/huggingface/accelerate/pull/2501
Fixup CLI test by @muellerzr in https://github.com/huggingface/accelerate/pull/2796
fix duplicate elements in split_between_processes by @hkunzhe in https://github.com/huggingface/accelerate/pull/2781
Add Elastic Launch Support to notebook_launcher by @yhna940 in https://github.com/huggingface/accelerate/pull/2788
Fix Wrong use of sync_gradients used to implement sync_each_batch by @fabianlim in https://github.com/huggingface/accelerate/pull/2790
Fix type in accelerator.py by @qgallouedec in https://github.com/huggingface/accelerate/pull/2800
fix comet ml test by @SunMarc in https://github.com/huggingface/accelerate/pull/2804
New template by @muellerzr in https://github.com/huggingface/accelerate/pull/2808
Fix access error for torch.mps when using torch==1.13.1 on macOS by @SunMarc in https://github.com/huggingface/accelerate/pull/2806
4-bit quantization meta device bias loading bug by @SunMarc in https://github.com/huggingface/accelerate/pull/2805
State dictionary retrieval from offloaded modules by @blbadger in https://github.com/huggingface/accelerate/pull/2619
add cuda dep for a test by @SunMarc in https://github.com/huggingface/accelerate/pull/2820
Remove out-dated xpu device check code in get_balanced_memory by @faaany in https://github.com/huggingface/accelerate/pull/2826
Fix DeepSpeed config validation error by changing stage3_prefetch_bucket_size value to an integer by @adk9 in https://github.com/huggingface/accelerate/pull/2814
Improve test speeds by up to 30% in multi-gpu settings by @muellerzr in https://github.com/huggingface/accelerate/pull/2830
monitor-interval, take 2 by @muellerzr in https://github.com/huggingface/accelerate/pull/2833
Optimize the megatron plugin by @zhangsheng377 in https://github.com/huggingface/accelerate/pull/2822
fix fstr format by @Jintao-Huang in https://github.com/huggingface/accelerate/pull/2810

New Contributors

@lyuwen made their first contribution in https://github.com/huggingface/accelerate/pull/2740
@MeVeryHandsome made their first contribution in https://github.com/huggingface/accelerate/pull/2681
@luowyang made their first contribution in https://github.com/huggingface/accelerate/pull/2730
@Iain-S made their first contribution in https://github.com/huggingface/accelerate/pull/2757
@BBuf made their first contribution in https://github.com/huggingface/accelerate/pull/2756
@TemryL made their first contribution in https://github.com/huggingface/accelerate/pull/2767
@helloworld1 made their first contribution in https://github.com/huggingface/accelerate/pull/2779
@hkunzhe made their first contribution in https://github.com/huggingface/accelerate/pull/2781
@adk9 made their first contribution in https://github.com/huggingface/accelerate/pull/2814
@Jintao-Huang made their first contribution in https://github.com/huggingface/accelerate/pull/2810

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.30.1...v0.31.0

`v0.30.1`: : Bugfixes

Compare Source

Patchfix

Fix duplicate environment variable check in multi-cpu condition thanks to @yhna940 in https://github.com/huggingface/accelerate/pull/2752
Fix issue with missing values in the SageMaker config leading to not being able to launch in https://github.com/huggingface/accelerate/pull/2753
Fix CPU OMP num threads setting thanks to @jiqing-feng in https://github.com/huggingface/accelerate/pull/2755
Fix FSDP checkpoint unable to resume when using offloading and sharded weights due to CUDA OOM when loading the optimizer and model https://github.com/huggingface/accelerate/pull/2762
Fixed the problem of incorrect conditional judgment statement when configuring enable_cpu_affinity thanks to @statelesshz in https://github.com/huggingface/accelerate/pull/2748
Fix stacklevel in logging to log the actual user call site (instead of the call site inside the logger wrapper) of log functions thanks to @luowyang in https://github.com/huggingface/accelerate/pull/2730
Fix support for multiple optimizers when using LOMO thanks to @younesbelkada in https://github.com/huggingface/accelerate/pull/2745

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.30.0...v0.30.1

`v0.30.0`: : Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more

Compare Source

Core

We've simplified the tqdm wrapper to make it fully passthrough, no need to have tqdm(main_process_only, *args), it is now just tqdm(*args) and you can pass in is_main_process as a kwarg.
We've added support for advanced optimizer usage:
- Schedule free optimizer introduced by Meta by @muellerzr in https://github.com/huggingface/accelerate/pull/2631
- LOMO optimizer introduced by OpenLMLab by @younesbelkada in https://github.com/huggingface/accelerate/pull/2695
Enable BF16 autocast to everything during FP8 and enable FSDP by @muellerzr in https://github.com/huggingface/accelerate/pull/2655
Support dataloader send_to_device calls to use no

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

[ ] If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Dec 01 '23 15:12 renovate[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 37.00%. Comparing base (6530de4) to head (8cc06e0). Report is 1 commits behind head on main.

:exclamation: Current head 8cc06e0 differs from pull request most recent head e01f6db

Please upload reports for the commit e01f6db to get more accurate results.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #273   +/-   ##
=======================================
  Coverage   37.00%   37.00%           
=======================================
  Files          23       23           
  Lines        1481     1481           
  Branches      202      202           
=======================================
  Hits          548      548           
  Misses        925      925           
  Partials        8        8

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Dec 01 '23 15:12 codecov[bot]

chore(deps): update dependency accelerate to v0.34.2

Release Notes

v0.34.2

v0.34.1: Patchfix

Bug fixes

v0.34.0: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!

Dependency Changes

Core

New Script Behavior Changes

DataLoader Enhancements

FP8 Training Improvements

torchpippy no more, long live torch.distributed.pipelining

Fully Sharded Data Parallelism (FSDP)

New Examples

Bug Fixes

New Contributors

Full Changelog:

Detailed Full Changelog:

v0.33.0: : MUSA backend support and bugfixes

MUSA backend support and bugfixes

What's Changed

New Contributors

v0.32.1

v0.32.0: : Profilers, new hooks, speedups, and more!

Core

Distributed Data Parallelism

FSDP

XPU

XLA

Examples

Full Changelog

New Contributors

v0.31.0: : Better support for sharded state dict with FSDP and Bugfixes

Core

FSDP

Megatron

What's Changed

New Contributors

v0.30.1: : Bugfixes

Patchfix

v0.30.0: : Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more

Core

Configuration

Codecov Report

`v0.34.2`

`v0.34.1`: Patchfix

`v0.34.0`: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!

`torchpippy` no more, long live `torch.distributed.pipelining`

`v0.33.0`: : MUSA backend support and bugfixes

`v0.32.1`

`v0.32.0`: : Profilers, new hooks, speedups, and more!

`v0.31.0`: : Better support for sharded state dict with FSDP and Bugfixes

`v0.30.1`: : Bugfixes

`v0.30.0`: : Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more