Alex Wortega comments

Results 16 comments of


                                            Alex Wortega

Added inference notebook

Yeah, of course.

Multi GPU Inference through nn.DataParallel

Ive tried same - it doesn't speed up, but u can starr multiple model for users

[BUG] Inference predictions dont match Huggingface for GPT-J

I have a same problem, i am using fine-tuned GPTJForCasualLM.from_pretrained() without low_cpu_mem_usage flag, and generation with deepspeed and without is different. I have last deepspeed version, transformers==4.21.2

[BUG] Inference predictions dont match Huggingface for GPT-J

Got same result with ``` replace_method=None, replace_with_kernel_inject=False ``` and different with ``` replace_method='auto', replace_with_kernel_inject=True ```

[BUG] Inference predictions dont match Huggingface for GPT-J

HUGGINGFACE: [{'generated_text': "Try without sampling.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if I'm doing it right.\n\nI'm not sure if"}] with...

[BUG] Inference predictions dont match Huggingface for GPT-J

I am using to '1.12.0+cu116'

[BUG] Inference predictions dont match Huggingface for GPT-J

``` -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required...