DeepSpeed [BUG] DeepSpeed loads the whole codegen model into GPU

I am trying 4x sharing for "Salesforce/codegen-16B-mono". 4 A10 chips (24 GiB each). torch type is torch.half.

My math told me (please double check): If the sharding is correct, then the model can be loaded into each GPU chip. But got a GPU OOM.

I found this and tried the injection policy in that, it worked (model is loadable) but hit a reshape error later. Seems the deepspeed needs special config for codegen model to make it loadable and runnable.

Any suggestions? cc @RezaYazdaniAminabadi

Feb 16 '23 04:02 xiejw

Hi @xiejw, codegen is not supported currently because it has a fused qkv and you're right that we need a special case for it.

Feb 16 '23 23:02 molly-smith

Thanks @molly-smith . Do you have any suggestions how to make it work first? And then make it fast? I think I am ok for non-fused qkv given 16B is quite large, so GPU can be busy for a while (I could be wrong though)

That will be very appreciated.

Feb 16 '23 23:02 xiejw

Hi @xiejw,

Can you try this PR using the kernels and as well as mp>1 and see if it works for you? Thanks, Reza

Mar 01 '23 01:03 RezaYazdaniAminabadi

Hi @RezaYazdaniAminabadi

I tried to patch your code manually but it is very confusing how to test your code

deepspeed 0.8.1 has new changes which throws errors like

 assert AutoTP.supported(model), "Automatic policy not supported for model. Please provide policy."

w/w.o. your changes (I manually patched). I passed --use_kernel, same error.

deepspeed 0.8.0 is the original version I used. the folder structure is different, for example, no this folder deepspeed/module_inject/containers

How can I test your PR? Thanks

Mar 01 '23 23:03 xiejw

Hi @xiejw,

Thanks for trying this out. let me try it on my side again and see if I can repro the same issue. Thanks, Reza

Mar 02 '23 18:03 RezaYazdaniAminabadi

@xiejw, are you trying this the same way I described in the PR?

Mar 02 '23 18:03 RezaYazdaniAminabadi

@xiejw, can you please try this again, passing --replace_method 'auto' when running with inference-test.py?

Mar 03 '23 05:03 RezaYazdaniAminabadi