brevity2021

Results 7 issues of brevity2021

Hi, First, thank you for the great work! I was playing with the `t5` notebook in `demo/generative-model`. I build a docker image through Makefile, and run the notebook from the...

When I run `docker pull ghcr.io/els-rd/transformer-deploy:0.5.0`, the result is: `Error response from daemon: manifest unknown`. If I change the 0.5.0 to 0.4.0 it works. Does the 0.5.0 exist?

**Describe the bug** I was following HuggingFace's script for deepspeed inference and found it doesn't work when `kernel_inject` is False **To Reproduce** Script: https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py Change kernel_inject=True (line 121) to kernel_inject=False...

bug
inference

**Describe the bug** I'm playing with some text generation using vanilla flanT5-XL using Deepspeed inference. When both using fp16, the Deepspeed inference generation result diverges from the Huggingface result (and...

bug
inference

**Describe the bug** Running the example script in [auto tensor parallelism doc](https://github.com/microsoft/DeepSpeed/blob/4ae3a3da0dfd19d7ab7a76e7c742ac12f44fc1c0/docs/_tutorials/automatic-tensor-parallelism.md) only works when the model size can fit in 1 GPU. For example, I was using a `g5.12xlarge`...

bug
inference

**Describe the bug** I was running the DeepSpeed inference [example](https://github.com/microsoft/DeepSpeedExamples/blob/8e4ec02c1545f7bd87d3bfe5daaafa5a5f1fe6a6/inference/huggingface/text-generation/inference-test.py) with kernel injection set to False, and the script has trouble loading checkpoints. (If kernel injection set to True it...

bug
inference

Hi, I was trying the "zero copy" method in the t5 notebook on a seq2seq transformer model. When I set the `clone_tensor` to True everything looks fine, just not as...

bug