DeepSpeed issues

Increasing the token-length based on available memory for GPT models

3

[BUG] Roberta Inference Engine RuntimeError: CUDA error: an illegal memory access was encountered

**Describe the bug** A clear and concise description of what the bug is. Encountered Illegal memory access when using inference engine for roberta model for long sequence (e.g. 512). For...

shuyingsunshine21

bug

inference

[BUG] Roberta Inference Engine Batch Output Unexpected

**Describe the bug** A clear and concise description of what the bug is. When using inference engine for roberta model, the output is unexpected when using batch size > 1....

shuyingsunshine21

bug

inference

Pydantify sub-configs

Refactor DeepSpeed Config sub-configs (i.e., activation checkpointing, autotuning, comms, compression, monitor, nebula, and profiling) to use the pydantic library.

mrwyattii

[BUG] when I tracking the FLOPs by FlopsProfiler, the FLOPs become greater and greater?

1

### **the code is :** (package version: transformers==4.21.1 torch==1.11.0 deepspeed==0.6.5 cuda==11.3 GPU==RTX3090) ``` import torch from transformers import BertTokenizer, BartForConditionalGeneration, BertModel, BertLMHeadModel from transformers.activations import GELUActivation from deepspeed.profiling.flops_profiler import FlopsProfiler...

GongCQ

bug

DeepSpeed still gives CUDA-out-of-memory issue

12

May I know why [this training code](https://colab.research.google.com/drive/1v5wY22CkyvKPz21tdwSMPv0T3fsIro0D?usp=sharing#scrollTo=6qJRPd9-sEdK) still gives CUDA-out-of-memory issue even after DeepSpeed is turned on ? ![image](https://user-images.githubusercontent.com/3324659/188926300-88fd6d1d-ed36-4351-9169-013013218ea7.png) See [this](https://github.com/microsoft/DeepSpeed/issues/2029#issuecomment-1229470437) for historical tracking purpose.

buttercutter

bug

Refactor remaining distributed tests

Continuing refactor of distributed unit tests started in #2141 and #2180. Also includes a fix for the broken nightly test (lm-eval)

mrwyattii

[BUG] [master] Garbage GPT-Neo-X output when using multi-gpu inference

3

**Describe the bug** Similar to #2233 and #2133 I'm seeing garbage output when using multi-gpu fp16 inference for gpt-neo-x. Running the script below, replacing Gpt-Neo-X with GPT-Neo-2.7B works fine. Output...

ryanai3

bug

inference

Comparison of native AllReduce and compressed AllReduce in DeepSpeed

3

Hi, I tested the native AllReduce (deepspeed.comm.all_reduce) and the compressed AllReduce (backend.compressed_allreduce) in DeepSpeed with [this test script](https://github.com/microsoft/DeepSpeed/blob/master/tests/onebit/test_nccl_perf.py). On a ROCm system, we observed 414% performance improvement of switching from...

hubertlu-tw

[REQUEST] An option to only save the model state_dict when save_checkpoint(), and how to manually save & load the model state_dict when using ZERO3

In my training code, I only save & load the model state_dict (no optimizer states). I find this is good enough with a few steps of warmup, and saves lots...

BlinkDL

enhancement

DeepSpeed
DeepSpeed copied to clipboard

Metadata

Increasing the token-length based on available memory for GPT models

[BUG] Roberta Inference Engine RuntimeError: CUDA error: an illegal memory access was encountered

[BUG] Roberta Inference Engine Batch Output Unexpected

Pydantify sub-configs

[BUG] when I tracking the FLOPs by FlopsProfiler, the FLOPs become greater and greater?

DeepSpeed still gives CUDA-out-of-memory issue

Refactor remaining distributed tests

[BUG] [master] Garbage GPT-Neo-X output when using multi-gpu inference

Comparison of native AllReduce and compressed AllReduce in DeepSpeed

[REQUEST] An option to only save the model state_dict when save_checkpoint(), and how to manually save & load the model state_dict when using ZERO3

← Metadata

Owner

Metadata

DeepSpeed DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeed
DeepSpeed copied to clipboard