chore(deps): update dependency accelerate to v0.34.2
This PR contains the following updates:
| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| accelerate | ==0.24.1 -> ==0.34.2 |
Release Notes
huggingface/accelerate (accelerate)
v0.34.2
v0.34.1: Patchfix
Bug fixes
- Fixes an issue where processed
DataLoaderscould no longer be pickled in #​3074 thanks to @​byi8220 - Fixes an issue when using FSDP where
default_transformers_cls_names_to_wrapwould separate_no_split_modulesby characters instead of keeping it as a list of layer names in #​3075
Full Changelog: https://github.com/huggingface/accelerate/compare/v0.34.0...v0.34.1
v0.34.0: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!
Dependency Changes
-
Updated Safetensors Requirement: The library now requires
safetensorsversion 0.4.3. -
Added support for Numpy 2.0: The library now fully supports
numpy2.0.0
Core
New Script Behavior Changes
-
Process Group Management: PyTorch now requires users to destroy process groups after training. The
acceleratelibrary will handle this automatically withaccelerator.end_training(), or you can do it manually usingPartialState().destroy_process_group(). - MLU Device Support: Added support for saving and loading RNG states on MLU devices by @​huismiling
-
NPU Support: Corrected backend and distributed settings when using
transfer_to_npu, ensuring better performance and compatibility.
DataLoader Enhancements
-
Stateful DataDataLoader: We are excited to announce that early support has been added for the
StatefulDataLoaderfromtorchdata, allowing better handling of data loading states. Enable by passinguse_stateful_dataloader=Trueto theDataLoaderConfiguration, and when callingload_state()theDataLoaderwill automatically be resumed from its last step, no more having to iterate through passed batches. -
Decoupled Data Loader Preparation: The
prepare_data_loader()function is now independent of theAccelerator, giving you more flexibility towards which API levels you would like to use. - XLA Compatibility: Added support for skipping initial batches when using XLA.
-
Improved State Management: Bug fixes and enhancements for saving/loading
DataLoaderstates, ensuring smoother training sessions. -
Epoch Setting: Introduced the
set_epochfunction forMpDeviceLoaderWrapper.
FP8 Training Improvements
-
Enhanced FP8 Training: Fully Sharded Data Parallelism (FSDP) and DeepSpeed support now work seamlessly with
TransformerEngineFP8 training, including better defaults for the quantized FP8 weights. -
Integration baseline: We've added a new suite of examples and benchmarks to ensure that our
TransformerEngineintegration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with rawTransformersEngine, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them here - Import Fixes: Resolved issues with import checks for the Transformers Engine that has downstream issues.
-
FP8 Docker Images: We've added new docker images for
TransformerEngineandaccelerateas well. Usedocker pull huggingface/accelerate@gpu-fp8-transformerengineto quickly get an environment going.
torchpippy no more, long live torch.distributed.pipelining
- With the latest PyTorch release,
torchpippyis now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on - There are breaking examples and changes that comes from this shift. Namely:
- Tracing of inputs is done with a shape each GPU will see, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of
[1, n, n]rather than[2, n, n]as before. -
We no longer support Encoder/Decoder models. PyTorch tracing for
pipeliningno longer supports encoder/decoder models, so thet5example has been removed. - Computer vision model support currently does not work: There are some tracing issues regarding resnet's we are actively looking into.
- Tracing of inputs is done with a shape each GPU will see, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of
-
If either of these changes are too breaking, we recommend pinning your accelerate version. If the encoder/decoder model support is actively blocking your inference using pippy, please open an issue and let us know. We can look towards adding in the old support for
torchpippypotentially if needed.
Fully Sharded Data Parallelism (FSDP)
-
Environment Flexibility: Environment variables are now fully optional for FSDP, simplifying configuration. You can now fully create a
FullyShardedDataParallelPluginyourself manually with no need for environment patching:
from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)
-
FSDP RAM efficient loading: Added a utility to enable RAM-efficient model loading (by setting the proper environmental variable). This is generally needed if not using
accelerate launchand need to ensure the env variables are setup properly for model loading:
from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
enable_fsdp_ram_efficient_loading()
- Model State Dict Management: Enhanced support for unwrapping model state dicts in FSDP, making it easier to manage distributed models.
New Examples
-
Configuration and Models: Improved configuration handling and introduced a configuration zoo for easier experimentation. You can learn more here. This was largely inspired by the
axolotllibrary, so very big kudos to their wonderful work - FSDP + SLURM Example: Added a minimal configuration example for running jobs with SLURM and using FSDP
Bug Fixes
- Fix bug of clip_grad_norm_ for xla fsdp by @​hanwen-sun in https://github.com/huggingface/accelerate/pull/2941
- Explicit check for
stepwhen loading the state by @​muellerzr in https://github.com/huggingface/accelerate/pull/2992 - Fix
find_tied_paramsfor models with shared layers by @​qubvel in https://github.com/huggingface/accelerate/pull/2986 - clear memory after offload by @​SunMarc in https://github.com/huggingface/accelerate/pull/2994
- fix default value for rank size in cpu threads_per_process assignment logic by @​rbrugaro in https://github.com/huggingface/accelerate/pull/3009
- Fix batch_sampler maybe None error by @​candlewill in https://github.com/huggingface/accelerate/pull/3025
- Do not import
transformer_engineon import by @​oraluben in https://github.com/huggingface/accelerate/pull/3056 - Fix torchvision to be compatible with torch version in CI by @​SunMarc in https://github.com/huggingface/accelerate/pull/2982
- Fix gated test by @​muellerzr in https://github.com/huggingface/accelerate/pull/2993
- Fix typo on warning str: "on the meta device device" -> "on the meta device" by @​HeAndres in https://github.com/huggingface/accelerate/pull/2997
- Fix deepspeed tests by @​muellerzr in https://github.com/huggingface/accelerate/pull/3003
- Fix torch version check by @​muellerzr in https://github.com/huggingface/accelerate/pull/3024
- Fix fp8 benchmark on single GPU by @​muellerzr in https://github.com/huggingface/accelerate/pull/3032
- Fix typo in comment by @​zmoki688 in https://github.com/huggingface/accelerate/pull/3045
- Speed up tests by shaving off subprocess when not needed by @​muellerzr in https://github.com/huggingface/accelerate/pull/3042
- Remove
skip_first_batchessupport for StatefulDataloader and fix all the tests by @​muellerzr in https://github.com/huggingface/accelerate/pull/3068
New Contributors
- @​byi8220 made their first contribution in https://github.com/huggingface/accelerate/pull/2957
- @​alex-jw-brooks made their first contribution in https://github.com/huggingface/accelerate/pull/2959
- @​XciD made their first contribution in https://github.com/huggingface/accelerate/pull/2981
- @​hanwen-sun made their first contribution in https://github.com/huggingface/accelerate/pull/2941
- @​HeAndres made their first contribution in https://github.com/huggingface/accelerate/pull/2997
- @​yitongh made their first contribution in https://github.com/huggingface/accelerate/pull/2966
- @​qubvel made their first contribution in https://github.com/huggingface/accelerate/pull/2986
- @​rbrugaro made their first contribution in https://github.com/huggingface/accelerate/pull/3009
- @​candlewill made their first contribution in https://github.com/huggingface/accelerate/pull/3025
- @​siddk made their first contribution in https://github.com/huggingface/accelerate/pull/3047
- @​oraluben made their first contribution in https://github.com/huggingface/accelerate/pull/3056
- @​tmm1 made their first contribution in https://github.com/huggingface/accelerate/pull/3055
- @​zmoki688 made their first contribution in https://github.com/huggingface/accelerate/pull/3045
Full Changelog:
- Require safetensors>=0.4.3 by @​byi8220 in https://github.com/huggingface/accelerate/pull/2957
- Fix torchvision to be compatible with torch version in CI by @​SunMarc in https://github.com/huggingface/accelerate/pull/2982
- Enable Unwrapping for Model State Dicts (FSDP) by @​alex-jw-brooks in https://github.com/huggingface/accelerate/pull/2959
- chore: Update runs-on configuration for CI workflows by @​XciD in https://github.com/huggingface/accelerate/pull/2981
- add MLU devices for rng state saving and loading. by @​huismiling in https://github.com/huggingface/accelerate/pull/2940
- remove .md to allow proper linking by @​nbroad1881 in https://github.com/huggingface/accelerate/pull/2977
- Fix bug of clip_grad_norm_ for xla fsdp by @​hanwen-sun in https://github.com/huggingface/accelerate/pull/2941
- Fix gated test by @​muellerzr in https://github.com/huggingface/accelerate/pull/2993
- Explicit check for
stepwhen loading the state by @​muellerzr in https://github.com/huggingface/accelerate/pull/2992 - Fix typo on warning str: "on the meta device device" -> "on the meta device" by @​HeAndres in https://github.com/huggingface/accelerate/pull/2997
- Support skip_first_batches for XLA by @​yitongh in https://github.com/huggingface/accelerate/pull/2966
- clear memory after offload by @​SunMarc in https://github.com/huggingface/accelerate/pull/2994
- Fix deepspeed tests by @​muellerzr in https://github.com/huggingface/accelerate/pull/3003
- Make env variables optional for FSDP by @​muellerzr in https://github.com/huggingface/accelerate/pull/2998
- Add small util to enable FSDP offloading quickly by @​muellerzr in https://github.com/huggingface/accelerate/pull/3006
- update version to 0.34.dev0 by @​SunMarc in https://github.com/huggingface/accelerate/pull/3007
- Fix
find_tied_paramsfor models with shared layers by @​qubvel in https://github.com/huggingface/accelerate/pull/2986 - Enable FSDP & Deepspeed + FP8 by @​muellerzr in https://github.com/huggingface/accelerate/pull/2983
- fix default value for rank size in cpu threads_per_process assignment logic by @​rbrugaro in https://github.com/huggingface/accelerate/pull/3009
- Wrong import check for TE by @​muellerzr in https://github.com/huggingface/accelerate/pull/3016
- destroy process group in
end_trainingby @​SunMarc in https://github.com/huggingface/accelerate/pull/3012 - Tweak defaults for quantized-typed FP8 TE weights by @​muellerzr in https://github.com/huggingface/accelerate/pull/3018
- Set correct NPU backend and distributed_type when using transfer_to_npu by @​ArthurinRUC in https://github.com/huggingface/accelerate/pull/3021
- Fix torch version check by @​muellerzr in https://github.com/huggingface/accelerate/pull/3024
- Add end_training/destroy_pg to everything and unpin numpy by @​muellerzr in https://github.com/huggingface/accelerate/pull/3030
- Improve config handling and add a zoo by @​muellerzr in https://github.com/huggingface/accelerate/pull/3029
- Add early support for
torchdata.stateful_dataloader.StatefulDataLoaderwithin theAcceleratorby @​byi8220 in https://github.com/huggingface/accelerate/pull/2895 - Fix fp8 benchmark on single GPU by @​muellerzr in https://github.com/huggingface/accelerate/pull/3032
- Fix batch_sampler maybe None error by @​candlewill in https://github.com/huggingface/accelerate/pull/3025
- Fixup dataloader state dict bugs + incorporate load/save_state API by @​muellerzr in https://github.com/huggingface/accelerate/pull/3034
- Decouple
prepare_data_loader()from Accelerator by @​siddk in https://github.com/huggingface/accelerate/pull/3047 - Update CONTRIBUTING.md Setup Instructions by @​siddk in https://github.com/huggingface/accelerate/pull/3046
- Add a SLURM example with minimal config by @​muellerzr in https://github.com/huggingface/accelerate/pull/2950
- Add FP8 docker images by @​muellerzr in https://github.com/huggingface/accelerate/pull/3048
- Update torchpippy by @​muellerzr in https://github.com/huggingface/accelerate/pull/2938
- Do not import
transformer_engineon import by @​oraluben in https://github.com/huggingface/accelerate/pull/3056 - use duck-typing to ensure underlying optimizer supports schedulefree hooks by @​tmm1 in https://github.com/huggingface/accelerate/pull/3055
- Fix typo in comment by @​zmoki688 in https://github.com/huggingface/accelerate/pull/3045
- add set_epoch for MpDeviceLoaderWrapper by @​hanwen-sun in https://github.com/huggingface/accelerate/pull/3053
- Speed up tests by shaving off subprocess when not needed by @​muellerzr in https://github.com/huggingface/accelerate/pull/3042
- Remove
skip_first_batchessupport for StatefulDataloader and fix all the tests by @​muellerzr in https://github.com/huggingface/accelerate/pull/3068
Detailed Full Changelog:
- https://github.com/huggingface/accelerate/compare/v0.33.0...v0.34.0
v0.33.0: : MUSA backend support and bugfixes
MUSA backend support and bugfixes
Small release this month, with key focuses on some added support for backends and bugs:
- Support MUSA (Moore Threads GPU) backend in accelerate by @​fmo-mt in https://github.com/huggingface/accelerate/pull/2917
- Allow multiple process per device by @​cifkao in https://github.com/huggingface/accelerate/pull/2916
- Add
torch.float8_e4m3fnformatdtype_byte_sizeby @​SunMarc in https://github.com/huggingface/accelerate/pull/2945 - Properly handle Params4bit in set_module_tensor_to_device by @​matthewdouglas in https://github.com/huggingface/accelerate/pull/2934
What's Changed
- [tests] fix bug in torch_device by @​faaany in https://github.com/huggingface/accelerate/pull/2909
- Fix slowdown on init with
device_map="auto"by @​muellerzr in https://github.com/huggingface/accelerate/pull/2914 - fix: bug where
multi_gpuwas being set and warning being printed even withnum_processes=1by @​HarikrishnanBalagopal in https://github.com/huggingface/accelerate/pull/2921 - Better error when a bad directory is given for weight merging by @​muellerzr in https://github.com/huggingface/accelerate/pull/2852
- add xpu device check before moving tensor directly to xpu device by @​faaany in https://github.com/huggingface/accelerate/pull/2928
- Add huggingface_hub version to setup.py by @​nullquant in https://github.com/huggingface/accelerate/pull/2932
- Correct loading of models with shared tensors when using accelerator.load_state() by @​jkuntzer in https://github.com/huggingface/accelerate/pull/2875
- Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix by @​yhna940 in https://github.com/huggingface/accelerate/pull/2889
- Fix import test by @​muellerzr in https://github.com/huggingface/accelerate/pull/2931
- Consider pynvml available when installed through the nvidia-ml-py distribution by @​matthewdouglas in https://github.com/huggingface/accelerate/pull/2936
- Improve test reliability for Accelerator.free_memory() by @​matthewdouglas in https://github.com/huggingface/accelerate/pull/2935
- delete CCL env var setting by @​Liangliang-Ma in https://github.com/huggingface/accelerate/pull/2927
- feat(ci): add
pipcaching in CI by @​SauravMaheshkar in https://github.com/huggingface/accelerate/pull/2952
New Contributors
- @​HarikrishnanBalagopal made their first contribution in https://github.com/huggingface/accelerate/pull/2921
- @​fmo-mt made their first contribution in https://github.com/huggingface/accelerate/pull/2917
- @​nullquant made their first contribution in https://github.com/huggingface/accelerate/pull/2932
- @​cifkao made their first contribution in https://github.com/huggingface/accelerate/pull/2916
- @​jkuntzer made their first contribution in https://github.com/huggingface/accelerate/pull/2875
- @​matthewdouglas made their first contribution in https://github.com/huggingface/accelerate/pull/2936
- @​Liangliang-Ma made their first contribution in https://github.com/huggingface/accelerate/pull/2927
- @​SauravMaheshkar made their first contribution in https://github.com/huggingface/accelerate/pull/2952
Full Changelog: https://github.com/huggingface/accelerate/compare/v0.32.1...v0.33.0
v0.32.1
v0.32.0: : Profilers, new hooks, speedups, and more!
Core
- Utilize shard saving from the
huggingface_hubrather than our own implementation (https://github.com/huggingface/accelerate/pull/2795) - Refactor logging to use logger in
dispatch_model(https://github.com/huggingface/accelerate/pull/2855) - The
Accelerator.stepnumber is now restored when usingsave_stateandload_state(https://github.com/huggingface/accelerate/pull/2765) - A new profiler has been added allowing users to collect performance metrics during model training and inference, including detailed analysis of execution time and memory consumption. These can then be generated in Chrome's tracing tool. Read more about it here (https://github.com/huggingface/accelerate/pull/2883)
- Reduced import times for doing
import accelerateand any other major core import by 68%, now should be only slightly longer than doingimport torch(https://github.com/huggingface/accelerate/pull/2845) - Fixed a bug in
get_backendand added aclear_device_cacheutility (https://github.com/huggingface/accelerate/pull/2857)
Distributed Data Parallelism
- Introduce DDP communication hooks to have more flexibility in how gradients are communicated across workers, overriding the standard
allreduce. (https://github.com/huggingface/accelerate/pull/2841) - Make
log_line_prefix_templateoptional thenotebook_launcher(https://github.com/huggingface/accelerate/pull/2888)
FSDP
- If the output directory doesn't exist when using
accelerate merge-weights, one will be automatically created (https://github.com/huggingface/accelerate/pull/2854) - When merging weights, the default is now
.safetensors(https://github.com/huggingface/accelerate/pull/2853)
XPU
- Migrate to pytorch's native XPU backend on
torch>=2.4(https://github.com/huggingface/accelerate/pull/2825) - Add
@require_tritontest decorator and enabletest_dynamowork on xpu (https://github.com/huggingface/accelerate/pull/2878) - Fixed
load_state_dictnot working onxpuand refine xpusafetensorsversion check (https://github.com/huggingface/accelerate/pull/2879)
XLA
- Added support for XLA Dynamo backends for both training and inference (https://github.com/huggingface/accelerate/pull/2892)
Examples
- Added a new multi-cpu SLURM example using
accelerate launch(https://github.com/huggingface/accelerate/pull/2902)
Full Changelog
- Use shard saving from huggingface_hub by @​SunMarc in https://github.com/huggingface/accelerate/pull/2795
- doc: fix link by @​imba-tjd in https://github.com/huggingface/accelerate/pull/2844
- Revert "Slight rename" by @​SunMarc in https://github.com/huggingface/accelerate/pull/2850
- remove warning hook addede during dispatch_model by @​SunMarc in https://github.com/huggingface/accelerate/pull/2843
- Remove underlines between badges by @​novialriptide in https://github.com/huggingface/accelerate/pull/2851
- Auto create dir when merging FSDP weights by @​helloworld1 in https://github.com/huggingface/accelerate/pull/2854
- Add DDP Communication Hooks by @​yhna940 in https://github.com/huggingface/accelerate/pull/2841
- Refactor logging to use logger in
dispatch_modelby @​panjd123 in https://github.com/huggingface/accelerate/pull/2855 - xpu: support xpu backend from stock pytorch (>=2.4) by @​dvrogozh in https://github.com/huggingface/accelerate/pull/2825
- Drop torch re-imports in npu and mlu paths by @​dvrogozh in https://github.com/huggingface/accelerate/pull/2856
- Default FSDP weights merge to safetensors by @​helloworld1 in https://github.com/huggingface/accelerate/pull/2853
- [tests] fix bug in
test_tracking.ClearMLTestby @​faaany in https://github.com/huggingface/accelerate/pull/2863 - [tests] use
torch_deviceinstead of0for device check by @​faaany in https://github.com/huggingface/accelerate/pull/2861 - [tests] skip bnb-related tests instead of failing on xpu by @​faaany in https://github.com/huggingface/accelerate/pull/2860
- Potentially fix tests by @​muellerzr in https://github.com/huggingface/accelerate/pull/2862
- [tests] enable XPU backend for
test_zero3_integrationby @​faaany in https://github.com/huggingface/accelerate/pull/2864 - Support saving and loading of step while saving and loading state by @​bipinKrishnan in https://github.com/huggingface/accelerate/pull/2765
- Add Profiler Support for Performance Analysis by @​yhna940 in https://github.com/huggingface/accelerate/pull/2883
- Speed up imports and add a CI by @​muellerzr in https://github.com/huggingface/accelerate/pull/2845
- Make
log_line_prefix_templateOptional in Elastic Launcher for Backward Compatibility by @​yhna940 in https://github.com/huggingface/accelerate/pull/2888 - Add XLA Dynamo backends for training and inference by @​johnsutor in https://github.com/huggingface/accelerate/pull/2892
- Added a MultiCPU SLURM example using Accelerate Launch and MPIRun by @​okhleif-IL in https://github.com/huggingface/accelerate/pull/2902
- make more cuda-only tests device-agnostic by @​faaany in https://github.com/huggingface/accelerate/pull/2876
- fix mlu device longTensor bugs by @​huismiling in https://github.com/huggingface/accelerate/pull/2887
- add
require_tritonand enabletest_dynamowork on xpu by @​faaany in https://github.com/huggingface/accelerate/pull/2878 - fix
load_state_dictfor xpu and refine xpu safetensor version check by @​faaany in https://github.com/huggingface/accelerate/pull/2879 - Fix get_backend bug and add clear_device_cache function by @​NurmaU in https://github.com/huggingface/accelerate/pull/2857
New Contributors
- @​McPatate made their first contribution in https://github.com/huggingface/accelerate/pull/2836
- @​imba-tjd made their first contribution in https://github.com/huggingface/accelerate/pull/2844
- @​novialriptide made their first contribution in https://github.com/huggingface/accelerate/pull/2851
- @​panjd123 made their first contribution in https://github.com/huggingface/accelerate/pull/2855
- @​dvrogozh made their first contribution in https://github.com/huggingface/accelerate/pull/2825
- @​johnsutor made their first contribution in https://github.com/huggingface/accelerate/pull/2892
- @​okhleif-IL made their first contribution in https://github.com/huggingface/accelerate/pull/2902
- @​NurmaU made their first contribution in https://github.com/huggingface/accelerate/pull/2857
Full Changelog: https://github.com/huggingface/accelerate/compare/v0.31.0...v0.32.0
v0.31.0: : Better support for sharded state dict with FSDP and Bugfixes
Core
- Set
timeoutdefault to PyTorch defaults based on backend by @​muellerzr in https://github.com/huggingface/accelerate/pull/2758 - fix duplicate elements in split_between_processes by @​hkunzhe in https://github.com/huggingface/accelerate/pull/2781
- Add Elastic Launch Support to
notebook_launcherby @​yhna940 in https://github.com/huggingface/accelerate/pull/2788 - Fix Wrong use of sync_gradients used to implement sync_each_batch by @​fabianlim in https://github.com/huggingface/accelerate/pull/2790
FSDP
- Introduce shard-merging util for FSDP by @​muellerzr in https://github.com/huggingface/accelerate/pull/2772
- Enable sharded state dict + offload to cpu resume by @​muellerzr in https://github.com/huggingface/accelerate/pull/2762
- Enable config for fsdp activation checkpointing by @​helloworld1 in https://github.com/huggingface/accelerate/pull/2779
Megatron
- Upgrade huggingface's megatron to nvidia's megatron when use MegatronLMPlugin by @​zhangsheng377 in https://github.com/huggingface/accelerate/pull/2501
What's Changed
- Add feature to allow redirecting std streams into log files when using torchrun as the launcher. by @​lyuwen in https://github.com/huggingface/accelerate/pull/2740
- Update modeling.py by adding try-catch section to skip the unavailable devices by @​MeVeryHandsome in https://github.com/huggingface/accelerate/pull/2681
- Fixed the problem of incorrect conditional judgment statement when configuring enable_cpu_affinity by @​statelesshz in https://github.com/huggingface/accelerate/pull/2748
- Fix stacklevel in
loggingto log the actual user call site (instead of the call site inside the logger wrapper) of log functions by @​luowyang in https://github.com/huggingface/accelerate/pull/2730 - LOMO / FIX: Support multiple optimizers by @​younesbelkada in https://github.com/huggingface/accelerate/pull/2745
- Fix max_memory assignment by @​SunMarc in https://github.com/huggingface/accelerate/pull/2751
- Fix duplicate environment variable check in multi-cpu condition by @​yhna940 in https://github.com/huggingface/accelerate/pull/2752
- Simplify CLI args validation and ensure CLI args take precedence over config file. by @​Iain-S in https://github.com/huggingface/accelerate/pull/2757
- Fix sagemaker config by @​muellerzr in https://github.com/huggingface/accelerate/pull/2753
- fix cpu omp num threads set by @​jiqing-feng in https://github.com/huggingface/accelerate/pull/2755
- Revert "Simplify CLI args validation and ensure CLI args take precedence over config file." by @​muellerzr in https://github.com/huggingface/accelerate/pull/2763
- Enable sharded cpu resume by @​muellerzr in https://github.com/huggingface/accelerate/pull/2762
- Sets default to PyTorch defaults based on backend by @​muellerzr in https://github.com/huggingface/accelerate/pull/2758
- optimize get_module_leaves speed by @​BBuf in https://github.com/huggingface/accelerate/pull/2756
- fix minor typo by @​TemryL in https://github.com/huggingface/accelerate/pull/2767
- Fix small edge case in get_module_leaves by @​SunMarc in https://github.com/huggingface/accelerate/pull/2774
- Skip deepspeed test by @​SunMarc in https://github.com/huggingface/accelerate/pull/2776
- Enable config for fsdp activation checkpointing by @​helloworld1 in https://github.com/huggingface/accelerate/pull/2779
- Add arg from CLI to fix failing test by @​muellerzr in https://github.com/huggingface/accelerate/pull/2783
- Skip tied weights disk offload test by @​SunMarc in https://github.com/huggingface/accelerate/pull/2782
- Introduce shard-merging util for FSDP by @​muellerzr in https://github.com/huggingface/accelerate/pull/2772
- FIX / FSDP : Guard fsdp utils for earlier PyTorch versions by @​younesbelkada in https://github.com/huggingface/accelerate/pull/2794
- Upgrade huggingface's megatron to nvidia's megatron when use MegatronLMPlugin by @​zhangsheng377 in https://github.com/huggingface/accelerate/pull/2501
- Fixup CLI test by @​muellerzr in https://github.com/huggingface/accelerate/pull/2796
- fix duplicate elements in split_between_processes by @​hkunzhe in https://github.com/huggingface/accelerate/pull/2781
- Add Elastic Launch Support to
notebook_launcherby @​yhna940 in https://github.com/huggingface/accelerate/pull/2788 - Fix Wrong use of sync_gradients used to implement sync_each_batch by @​fabianlim in https://github.com/huggingface/accelerate/pull/2790
- Fix type in accelerator.py by @​qgallouedec in https://github.com/huggingface/accelerate/pull/2800
- fix comet ml test by @​SunMarc in https://github.com/huggingface/accelerate/pull/2804
- New template by @​muellerzr in https://github.com/huggingface/accelerate/pull/2808
- Fix access error for torch.mps when using torch==1.13.1 on macOS by @​SunMarc in https://github.com/huggingface/accelerate/pull/2806
- 4-bit quantization meta device bias loading bug by @​SunMarc in https://github.com/huggingface/accelerate/pull/2805
- State dictionary retrieval from offloaded modules by @​blbadger in https://github.com/huggingface/accelerate/pull/2619
- add cuda dep for a test by @​SunMarc in https://github.com/huggingface/accelerate/pull/2820
- Remove out-dated xpu device check code in
get_balanced_memoryby @​faaany in https://github.com/huggingface/accelerate/pull/2826 - Fix DeepSpeed config validation error by changing
stage3_prefetch_bucket_sizevalue to an integer by @​adk9 in https://github.com/huggingface/accelerate/pull/2814 - Improve test speeds by up to 30% in multi-gpu settings by @​muellerzr in https://github.com/huggingface/accelerate/pull/2830
- monitor-interval, take 2 by @​muellerzr in https://github.com/huggingface/accelerate/pull/2833
- Optimize the megatron plugin by @​zhangsheng377 in https://github.com/huggingface/accelerate/pull/2822
- fix fstr format by @​Jintao-Huang in https://github.com/huggingface/accelerate/pull/2810
New Contributors
- @​lyuwen made their first contribution in https://github.com/huggingface/accelerate/pull/2740
- @​MeVeryHandsome made their first contribution in https://github.com/huggingface/accelerate/pull/2681
- @​luowyang made their first contribution in https://github.com/huggingface/accelerate/pull/2730
- @​Iain-S made their first contribution in https://github.com/huggingface/accelerate/pull/2757
- @​BBuf made their first contribution in https://github.com/huggingface/accelerate/pull/2756
- @​TemryL made their first contribution in https://github.com/huggingface/accelerate/pull/2767
- @​helloworld1 made their first contribution in https://github.com/huggingface/accelerate/pull/2779
- @​hkunzhe made their first contribution in https://github.com/huggingface/accelerate/pull/2781
- @​adk9 made their first contribution in https://github.com/huggingface/accelerate/pull/2814
- @​Jintao-Huang made their first contribution in https://github.com/huggingface/accelerate/pull/2810
Full Changelog: https://github.com/huggingface/accelerate/compare/v0.30.1...v0.31.0
v0.30.1: : Bugfixes
Patchfix
- Fix duplicate environment variable check in multi-cpu condition thanks to @​yhna940 in https://github.com/huggingface/accelerate/pull/2752
- Fix issue with missing values in the SageMaker config leading to not being able to launch in https://github.com/huggingface/accelerate/pull/2753
- Fix CPU OMP num threads setting thanks to @​jiqing-feng in https://github.com/huggingface/accelerate/pull/2755
- Fix FSDP checkpoint unable to resume when using offloading and sharded weights due to CUDA OOM when loading the optimizer and model https://github.com/huggingface/accelerate/pull/2762
- Fixed the problem of incorrect conditional judgment statement when configuring enable_cpu_affinity thanks to @​statelesshz in https://github.com/huggingface/accelerate/pull/2748
- Fix stacklevel in logging to log the actual user call site (instead of the call site inside the logger wrapper) of log functions thanks to @​luowyang in https://github.com/huggingface/accelerate/pull/2730
- Fix support for multiple optimizers when using LOMO thanks to @​younesbelkada in https://github.com/huggingface/accelerate/pull/2745
Full Changelog: https://github.com/huggingface/accelerate/compare/v0.30.0...v0.30.1
v0.30.0: : Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more
Core
- We've simplified the
tqdmwrapper to make it fully passthrough, no need to havetqdm(main_process_only, *args), it is now justtqdm(*args)and you can pass inis_main_processas a kwarg. - We've added support for advanced optimizer usage:
- Schedule free optimizer introduced by Meta by @​muellerzr in https://github.com/huggingface/accelerate/pull/2631
- LOMO optimizer introduced by OpenLMLab by @​younesbelkada in https://github.com/huggingface/accelerate/pull/2695
- Enable BF16 autocast to everything during FP8 and enable FSDP by @​muellerzr in https://github.com/huggingface/accelerate/pull/2655
- Support dataloader send_to_device calls to use no
Configuration
đź“… Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
â™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
- [ ] If you want to rebase/retry this PR, check this box
This PR was generated by Mend Renovate. View the repository job log.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 37.00%. Comparing base (
6530de4) to head (8cc06e0). Report is 1 commits behind head on main.
:exclamation: Current head 8cc06e0 differs from pull request most recent head e01f6db
Please upload reports for the commit e01f6db to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #273 +/- ##
=======================================
Coverage 37.00% 37.00%
=======================================
Files 23 23
Lines 1481 1481
Branches 202 202
=======================================
Hits 548 548
Misses 925 925
Partials 8 8
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.