DeepSpeed
DeepSpeed copied to clipboard
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
 I just don't want to print the warning and infos.
**Describe the bug** I am training an LLM using DeepSpeed and 12 nodes a 8 V100s per node. My training is generally working well (thanks DeepSpeed), but when I run...
Deprecate redundant sequence_data_parallel_group argument. Users/client code will control across which process group Z3 parameters will be partitioned from one of [None, data_parallel_group, sequence_data_parallel].
- Removed conditional imports of TritonMLP and TritonSelfAttention from module level - Implemented lazy imports for Triton modules inside __init__ method - This change aims to resolve circular dependency issues...
I want to use the mixtral 8X7B model for inference, but currently it only supports autoTP. How to add more support to enable it to use more parallelism (e.g. EP,...
when I finetune the model use deepspeed on 2 A800,log only contain worker1,no worker2. Is there any way to print the loss of Worker2? The GPUs on both machines are...
**Describe the bug** For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large...
I am try DeepSpeed. I am read docs and modify one project for it. And I am get strange result: 1) Original code without any speed up. 1 docker container....
The Nightly CI for https://github.com/microsoft/DeepSpeed/actions/runs/10985541802 failed.