Trian Xylouris comments

Results 33 comments of


                                            Trian Xylouris

Errors in generation (Bloom) when changing options sampling/use_cache

> Hi @mayank31398, > > I am still working on this. Can I ask what an average maximum number of tokens for an input would be? Potentially, this can go...

TypeError: getattr(): attribute name must be string

Hey @henrydylan - have you tried just specifying an optimizer via e.g. ``` ..."optimizer": { "type": "Adam", "params": { "lr": 0.00015 } },... ``` Maybe that could help. Also, you...

Can the fairseq-13b model be used commercially? Which license applies?

Also, thanks from my side @stephenroller for the huge amount of work you have made available to all of us! One question: I understand the limitations of this technology and...

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

Hey @tomerip - were you able to find a workaround? I am experiencing the same problem with gpt-models.

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

Great, thanks @RezaYazdaniAminabadi

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

Hi @RezaYazdaniAminabadi , just checking whether you had the chance to work on that PR so far?

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

Happy to help with testing any potential fixes! If it will still take some time, then it would be great if there is a link with Bloom's fix, so that...

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

Thanks @RezaYazdaniAminabadi for fixing this! Commit 4abd455521965930d0e921de8afc0073ea7df9d1 from the [PR you mentioned](https://github.com/microsoft/DeepSpeed/pull/2212) fixes the problem when I tested it using a Huggingface `gpt2` model. By the way: The commit aafba00c81eaf29c0c2b209a94bc31f4de942936...

[BUG] Illegal memory access CUDA error when using long sequences

Below is a possibly related bug. I added some sample code to reproduce this error for a `GPT2` model on an NVidia A10G. Let me know @RezaYazdaniAminabadi @cmikeh2 if you...

[BUG] Illegal memory access CUDA error when using long sequences

FYI @mallorbc , @tomeras91 , @RezaYazdaniAminabadi : My related issue which I detailed above is fixed in [this PR](https://github.com/microsoft/DeepSpeed/pull/2212). More precisely, my issue does not appear when I install the...