andrasiani issues

Repositories
Issues
Comments

Results 2 issues of


                                            andrasiani

[Deepspeed stage-3 student+teacher crash]

Hi, I have a 1.5 B param GPT-XL pretrained teacher network in fp16 with requires_grad=False. The student network is a small GPT with 142 M params. I use pytorch lightning...

bug

training

partition_activations produces no activation memory improvement with zero3

Hi, I am trying to run a gpt2 model with blocksize 2048, and I cannot use batchsize larger than 16 because activation memory becomes too large. To reduce activation memory...

stale