DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG]embedding is not splited while inference using gpt2

Open katitizhou opened this issue 3 years ago • 1 comments

Describe the bug deepspeed --num_gpus 2 inference-test.py --name gpt2 I modify the inference-test.py cause i do not want to use the .cu files. It seems that GPT2 embedding is not splited into two gpus cause i got an error "RuntimeError: shape '[1, 7, 6, 64]' is invalid for input of size 5376". How to set the right injection_policy to split wte and wpe for GPT2? image

image

if args.ds_inference: pipe.model = deepspeed.init_inference(pipe.model, dtype=data_type, mp_size=world_size, # replace_with_kernel_inject=True, replace_with_kernel_inject=False, injection_policy={GPT2Block:('attention.out_proj','mlp.c_proj')}, max_out_tokens=args.max_tokens, **ds_kwargs )

To Reproduce Steps to reproduce the behavior: modify inference-test.py: if args.ds_inference: pipe.model = deepspeed.init_inference(pipe.model, dtype=data_type, mp_size=world_size, # replace_with_kernel_inject=True, replace_with_kernel_inject=False, injection_policy={GPT2Block:('attention.out_proj','mlp.c_proj')}, max_out_tokens=args.max_tokens, **ds_kwargs )

run : deepspeed --num_gpus 2 inference-test.py --name gpt2 Expected behavior gpt2 2 tp can run

ds_report output Please run ds_report to give us details about your setup.

Screenshots image

System info (please complete the following information): transformers 4.21.2
deepspeed 0.7.7

katitizhou avatar Jan 09 '23 10:01 katitizhou

What is the right way to set injection_policy to split wte and wpe for GPT2? I found no relevant examples for this question

katitizhou avatar Feb 01 '23 02:02 katitizhou

Hi @katitizhou, GPT2 is not supported for tensor parallelism without kernel injection. You can split gpt2 across multiple GPUs by setting kernel injection to True and removing injection policy.

molly-smith avatar Mar 10 '23 02:03 molly-smith