Jeff Rasley

Results 18 issues of Jeff Rasley

DeepSpeed has support for several dtypes now (i.e., fp32, fp16, bf16). However, it's becoming less clear what parts of training are using what dtypes and what time. For example, in...

enhancement

We noticed our DeepSpeed + Accelerate unit tests are failing on torch 1.8. `torch.distributed.run` requires torch 1.9+ so bumping your min torch version to 1.9. If you'd rather guard the...

AML deployments the model dir is not writeable, download config/tokenizer to a writeable cache path.

Provide local AML deployment option, this will use the [AML inference server](https://pypi.org/project/azureml-inference-server-http/) for the front end. We can then easily deploy an MII generated score file via: `azmlinfsrv --model_dir --entry_script...

enhancement

After #25 is complete we want to expose all DS-inference configs (https://deepspeed.readthedocs.io/en/latest/inference-init.html#deepspeed.init_inference) and ZeRO inference configs in the MII config dictionary.

enhancement

https://github.com/huggingface/transformers/pull/18261 introduces model arg validation, which is not compatible with how ds-inference was originally setup. We no longer need to do all of the things we previously did in an...

Checking to #2310 allows us to run our mp>1 neox tests.

- [x] add pre/post forward methods - [x] add generate method if the wrapped module has this attribute - [ ] add documentation for new pre/post forward calls to RTD...