DeepSpeed issues

How to silence warning when import deepseed without change the source code?

1

![image](https://github.com/user-attachments/assets/cd270192-c7d8-4664-8f22-10a1dfa81459) I just don't want to print the warning and infos.

sean-wade

enhancement

[BUG] File not found in autotuner cache in multi-node setting on SLURM

2

**Describe the bug** I am training an LLM using DeepSpeed and 12 nodes a 8 V100s per node. My training is generally working well (thanks DeepSpeed), but when I run...

jubueche

bug

training

Fix redundant seq data parallel grp argument in Z3/MiCS

Deprecate redundant sequence_data_parallel_group argument. Users/client code will control across which process group Z3 parameters will be partitioned from one of [None, data_parallel_group, sequence_data_parallel].

samadejacobs

Fix circular import in ds_transformer.py

1

- Removed conditional imports of TritonMLP and TritonSelfAttention from module level - Implemented lazy imports for Triton modules inside __init__ method - This change aims to resolve circular dependency issues...

sznmelvin

[NaN check] Add NaN check to support bfloat16.

2

ys950902

Request for Mixtral 8X7B inference with DP+EP+TP

6

I want to use the mixtral 8X7B model for inference, but currently it only supports autoTP. How to add more support to enable it to use more parallelism (e.g. EP,...

haoranlll

enhancement

when I finetune the model use deepspeed on 2 4*A800s，log only contain worker1

2

when I finetune the model use deepspeed on 2 A800，log only contain worker1，no worker2. Is there any way to print the loss of Worker2? The GPUs on both machines are...

bill4689

[BUG] Training time regression with ZeRO-3 after upgrade to torch 2.3.1 and CUDA 12.1

2

**Describe the bug** For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large...

SumanthRH

bug

training

DeepSpeed 2 GPU slower than 1 GPU. PyTorch DDP - much faster. Why?

6

I am try DeepSpeed. I am read docs and modify one project for it. And I am get strange result: 1) Original code without any speed up. 1 docker container....

Vadim2S

nv-ds-chat CI test failure

The Nightly CI for https://github.com/microsoft/DeepSpeed/actions/runs/10985541802 failed.

github-actions[bot]

ci-failure

DeepSpeed
DeepSpeed copied to clipboard

Metadata

How to silence warning when import deepseed without change the source code?

[BUG] File not found in autotuner cache in multi-node setting on SLURM

Fix redundant seq data parallel grp argument in Z3/MiCS

Fix circular import in ds_transformer.py

[NaN check] Add NaN check to support bfloat16.

Request for Mixtral 8X7B inference with DP+EP+TP

when I finetune the model use deepspeed on 2 4*A800s，log only contain worker1

[BUG] Training time regression with ZeRO-3 after upgrade to torch 2.3.1 and CUDA 12.1

DeepSpeed 2 GPU slower than 1 GPU. PyTorch DDP - much faster. Why?

nv-ds-chat CI test failure

← Metadata

Owner

Metadata

DeepSpeed DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeed
DeepSpeed copied to clipboard