diffusers A6000 GPU: Getting torch.cuda.OutOfMemoryError: CUDA out of memory

I am trying to finetune a dreambooth model using diffusers repo.

GPU: A6000 48GB VRAM

but getting the following error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 47.54 GiB total capacity; 43.42 GiB already allocated; 1.50 GiB free; 44.20 GiB reserved 
in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

With the following settings:

learning rate: 1e-6
lr_scheduler: polynomial
max_train_steps: 3000
batch_size: 3
lr_warmup_steps: 200

use_8bit_adam and gradient_checkpointing is False

can anyone please help me? even with such big GPU RAM, how can I get Cuda OOM error? 😅

Dec 19 '22 20:12 geekyayush

Hey @geekyayush,

Can you try simply reducing the batch size?

Dec 20 '22 00:12 patrickvonplaten

Thanks @patrickvonplaten Reducing the batch size to 1 worked.

If you don't mind, could you please tell me about batch_size? I am relatively new to ML and have been trying to follow some tutorials. but could not find anything on batch_size.

Maybe you can share some documentation links to batch_size?

I would really appreciate it. Thanks again @patrickvonplaten

Dec 21 '22 07:12 geekyayush

https://www.google.com/search?q=batch+size+in+machine+learning&oq=batch+size+in+machine+learning&aqs=chrome..69i57.2817j0j1&sourceid=chrome&ie=UTF-8

maybe helps?

Jan 02 '23 16:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 27 '23 15:01 github-actions[bot]