brevity2021 comments

Results 6 comments of


                                            brevity2021

[BUG] Pythia (GPT-NeoX based) models degrade in generation quality using DeepSpeed Inference

Also running into this issue. Code to reproduce (run with deepspeed --num_gpus=1) DeepSpeed version is 0.9.2. ``` model_name = "EleutherAI/pythia-70m-deduped" model = ( AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, ) .eval() .to("cuda") )...

[QUESTION] How to figure out correct `injection_policy` for Flan-T5

According to this [auto tensor parallelism](https://github.com/microsoft/DeepSpeed/blob/4ae3a3da0dfd19d7ab7a76e7c742ac12f44fc1c0/docs/_tutorials/automatic-tensor-parallelism.md) doc, t5 model should no longer need injection policy?

[BUG] Incorrect Model Outputs When Using Beam Search

We need `num_beams` > 1 also to actually use DeepSpeed.

Zero copy may lead to wrong text generation results

Thanks for the reply! I was testing with ORT 1.12. It might be due to the model implementation(since I was using Pegasus instead of T5, and had to modify the...

Zero copy may lead to wrong text generation results

@pommedeterresautee Here are the notebooks to replicate the error. To make things easier I use a t5 model for illustration. I first run `make docker_build` from a clean `transformer-deploy` directory,...

Zero copy may lead to wrong text generation results

@c-schumacher This [[notebook]](https://github.com/ELS-RD/transformer-deploy/blob/main/demo/generative-model/t5.ipynb) mentions "Version 1.11.1 of ONNX Runtime and older have a bug which makes them much slower when most inputs are used by subgraphs of an If node....