Alex Wortega

Results 16 comments of Alex Wortega

Ive tried same - it doesn't speed up, but u can starr multiple model for users

I have a same problem, i am using fine-tuned GPTJForCasualLM.from_pretrained() without low_cpu_mem_usage flag, and generation with deepspeed and without is different. I have last deepspeed version, transformers==4.21.2

Got same result with ``` replace_method=None, replace_with_kernel_inject=False ``` and different with ``` replace_method='auto', replace_with_kernel_inject=True ```

HUGGINGFACE: [{'generated_text': "Try without sampling.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if"}] with...

``` -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required...

https://colab.research.google.com/drive/1nv-UI30gPx7Hj6laeV2DKwvey3CjCzCM?usp=sharing i reproduce error on colab, but fix doesnt lol UPD Fix bug reproducing too, just reload notebook, i provide proof in end of notebook

I ve tryed - Downgrading till 0.5.9 - doesnt help - Redifine injection_policy injection_policy={GPTNeoBlock: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')} - doesnt help What else can i try? Idn what to do (

Hi @RezaYazdaniAminabadi , Can you show you ds_report + pip freeze? Thanks, Alex