DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Results 1333 DeepSpeed issues
Sort by recently updated
recently updated
newest added

**Describe the bug** A clear and concise description of what the bug is. Encountered Illegal memory access when using inference engine for roberta model for long sequence (e.g. 512). For...

bug
inference

**Describe the bug** A clear and concise description of what the bug is. When using inference engine for roberta model, the output is unexpected when using batch size > 1....

bug
inference

Refactor DeepSpeed Config sub-configs (i.e., activation checkpointing, autotuning, comms, compression, monitor, nebula, and profiling) to use the pydantic library.

### **the code is :** (package version: transformers==4.21.1 torch==1.11.0 deepspeed==0.6.5 cuda==11.3 GPU==RTX3090) ``` import torch from transformers import BertTokenizer, BartForConditionalGeneration, BertModel, BertLMHeadModel from transformers.activations import GELUActivation from deepspeed.profiling.flops_profiler import FlopsProfiler...

bug

May I know why [this training code](https://colab.research.google.com/drive/1v5wY22CkyvKPz21tdwSMPv0T3fsIro0D?usp=sharing#scrollTo=6qJRPd9-sEdK) still gives CUDA-out-of-memory issue even after DeepSpeed is turned on ? ![image](https://user-images.githubusercontent.com/3324659/188926300-88fd6d1d-ed36-4351-9169-013013218ea7.png) See [this](https://github.com/microsoft/DeepSpeed/issues/2029#issuecomment-1229470437) for historical tracking purpose.

bug

Continuing refactor of distributed unit tests started in #2141 and #2180. Also includes a fix for the broken nightly test (lm-eval)

**Describe the bug** Similar to #2233 and #2133 I'm seeing garbage output when using multi-gpu fp16 inference for gpt-neo-x. Running the script below, replacing Gpt-Neo-X with GPT-Neo-2.7B works fine. Output...

bug
inference

Hi, I tested the native AllReduce (deepspeed.comm.all_reduce) and the compressed AllReduce (backend.compressed_allreduce) in DeepSpeed with [this test script](https://github.com/microsoft/DeepSpeed/blob/master/tests/onebit/test_nccl_perf.py). On a ROCm system, we observed 414% performance improvement of switching from...

In my training code, I only save & load the model state_dict (no optimizer states). I find this is good enough with a few steps of warmup, and saves lots...

enhancement