DeepSpeed
DeepSpeed copied to clipboard
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
**Describe the bug** Many modern transformer components (e.g., RoPE, certain Layer Norm setups) need to be stored and run in FP32. Most of the time, we can accomplish this by...
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...
I am tryin to use the universal checkpoint conversion code, `python ds_to_universal.py `, but I get this error that can't find a layer number. I'm not sure why, but I...
**Describe the bug** i want to train a dolly 2.0 2.8b model with using deepspeed but display on terminal is always same did i miss something? without using deepspeed it...
**Describe the bug** I'm using DeepSpeed MoE layer to build a multi-modal LLM, I'm using Phi-3 as the base model, and replaced the MLP layer with MoE layer in DeepSpeed....
pp_size = 8 stage 0 contains a vision encoder of 45 layers stage 1~7 contain 56 layers of decoder zero 0 is well but zero 1 and bf16/fp16 failed much...
I am experiencing excessive CPU and GPU memory usage when running multi-GPU inference with DeepSpeed. Specifically, the memory usage does not scale as expected when increasing the number of GPUs....
**Describe the bug** When using pipelining (with or without `LayerSpec` inside `PipelineModule`), the first GPU seems to have a considerably higher memory consumption, compared to the other ones. This is...
**Describe the bug** Hello. I'm an active user of deepspeed for multi-node training. I've always used zero3, but this time I tried attaching the hpz feature of zero++ for the...