DeepSpeed
DeepSpeed copied to clipboard
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
**Describe the bug** When I use `deepspeed.init_inference` for Megatron gpt-3 MoE model in [megatron repo](https://github.com/microsoft/Megatron-DeepSpeed), error occurs. There's no problem when I use `deepspeed.init` instead as done in the training...
This is a follow-up PR to the existing one: https://github.com/microsoft/DeepSpeed/pull/2127 The goal is to keep this PR open and investigate the issue in more detail while the PR above removes...
**Describe the bug** Sample alexnet example for profiler does not work. https://www.deepspeed.ai/tutorials/flops-profiler/ **To Reproduce** copy-paste the example as-is and run it **Expected behavior** should produce the three results, flops, macs,...
**Is your feature request related to a problem? Please describe.** The current examples for DeepSpeed inference uses cmd line 'deepspeed' that internally uses launcher modules of deepspeed to initialize the...
Hi , DeepSpeed team. I want to know more details about the example described in [BERT Pre-training](https://www.deepspeed.ai/tutorials/bert-pretraining) It took 8 hr 41 min with 4 DGX-2. I wonder how many...
**Is your feature request related to a problem? Please describe.** Deepspeed being a library for high speed training large model but most of the DL developers use Azure VMs and...
I got the following output when I installed using pip: ``` (yzy) C:\Users\hg>pip install deepspeed Collecting deepspeed Using cached deepspeed-0.7.0.tar.gz (629 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error ×...
**Describe the bug** When I use zero optimization(stage=3),It's spend lots of time on loading model. I'm trying to finetune OPT-66B on 2 node,each node contain 8*NVIDIA A100-SXM(80G),1TB RAM. I have...
Hi, On a note related to #325, I have added a conda recipe: https://github.com/conda-forge/staged-recipes/pull/14699 But there seems to be a weird bug related to ninja: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=307963&view=logs&j=6f142865-96c3-535c-b7ea-873d86b887bd&t=22b0682d-ab9e-55d7-9c79-49f3c3ba4823&l=1431 Any help and/or insight...
In optimizer partition, the parameters are fused into a big vector and then get partitioned over workers. So the number of chunks can be much lesser than the number of...