DeepSpeed [BUG]embedding is not splited while inference using gpt2

Describe the bug deepspeed --num_gpus 2 inference-test.py --name gpt2 I modify the inference-test.py cause i do not want to use the .cu files. It seems that GPT2 embedding is not splited into two gpus cause i got an error "RuntimeError: shape '[1, 7, 6, 64]' is invalid for input of size 5376". How to set the right injection_policy to split wte and wpe for GPT2?

if args.ds_inference: pipe.model = deepspeed.init_inference(pipe.model, dtype=data_type, mp_size=world_size, # replace_with_kernel_inject=True, replace_with_kernel_inject=False, injection_policy={GPT2Block:('attention.out_proj','mlp.c_proj')}, max_out_tokens=args.max_tokens, **ds_kwargs )

To Reproduce Steps to reproduce the behavior: modify inference-test.py: if args.ds_inference: pipe.model = deepspeed.init_inference(pipe.model, dtype=data_type, mp_size=world_size, # replace_with_kernel_inject=True, replace_with_kernel_inject=False, injection_policy={GPT2Block:('attention.out_proj','mlp.c_proj')}, max_out_tokens=args.max_tokens, **ds_kwargs )

run : deepspeed --num_gpus 2 inference-test.py --name gpt2 Expected behavior gpt2 2 tp can run

ds_report output Please run ds_report to give us details about your setup.

Screenshots

System info (please complete the following information): transformers 4.21.2
deepspeed 0.7.7

Jan 09 '23 10:01 katitizhou

What is the right way to set injection_policy to split wte and wpe for GPT2? I found no relevant examples for this question

Feb 01 '23 02:02 katitizhou

Hi @katitizhou, GPT2 is not supported for tensor parallelism without kernel injection. You can split gpt2 across multiple GPUs by setting kernel injection to True and removing injection policy.

Mar 10 '23 02:03 molly-smith