Molly Smith
Molly Smith
Bloom with kernel injection was showing significant logits mismatch compared to Transformer's baseline as reported by issue https://github.com/microsoft/DeepSpeed/issues/2730. Softmax input_mask is float32, not int64, and needs to be converted to...
Creates blog for automatric tensor parallelism feature.
Expands unsupported model list and adds more checks for clean error exit.
Remove bf16 from inference config dtye enum because not it is not supported. Users should now see pydantic error with supported types vs. vague CUDA error. ``` pydantic.error_wrappers.ValidationError: 1 validation...
Updates Auto Tensor Parallelism tutorial with T5 example instead of OPT, since OPT is supported with kernel injection and we would like to showcase a model that does not have...
The number of GPUs or mp_size needs to be a factor of a model's hidden dimension, embedded dimension, number of attention heads, etc. Otherwise we encounter various tensor size errors...
I get errors when trying to run huggingface example test-wav2vec2.py. First I get missing python package errors (datasets, jiwer). After installing packages I see: RuntimeError: Error opening '/home/mosm/.cache/huggingface/datasets/downloads/extracted/e4488bdcc5e36bb8e49ff9b437db0cde3f99b8f604fabd9bc27b267ced1c7967/6930-75918-0000.flac': System error....
Skip auto TP if no tensor parallelism is needed / using only 1 GPU. https://github.com/microsoft/DeepSpeed/issues/3285