tango icon indicating copy to clipboard operation
tango copied to clipboard

16 GB of GPU memory runs out

Open itsmasabdi opened this issue 2 years ago • 2 comments

Hi.

I'm trying to train this model on a single P100 with 16 GB memory but seem to be running out of memory with a batch size of 2. Do I need more than 16 GB for this model? How can I reduce the GPU memory usage?

Cheers,

itsmasabdi avatar May 27 '23 09:05 itsmasabdi

Hey, you can try the following:

  1. Use a smaller text encoder and a smaller diffusion model if you are training from scratch.
  2. Use the Adafactor / 8 Bit Adam optimizer. This should reduce memory consumption significantly.
  3. Use gradient checkpointing from accelerate.
  4. Use a batch size of 1 without augmentation.
  5. If memory still runs out then you need to use DeepSpeed ZeRO with CPU Offload.

You can follow this guide: https://huggingface.co/docs/transformers/perf_train_gpu_one

deepanwayx avatar May 28 '23 16:05 deepanwayx

Hi, @deepanwayx I would like to ask if you have trained tango with deepspeed? I have encountered some problems. Can you provide some advice?

chenxinglili avatar Mar 22 '24 08:03 chenxinglili