A6000 GPU: Getting torch.cuda.OutOfMemoryError: CUDA out of memory
I am trying to finetune a dreambooth model using diffusers repo.
GPU: A6000 48GB VRAM
but getting the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 47.54 GiB total capacity; 43.42 GiB already allocated; 1.50 GiB free; 44.20 GiB reserved
in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
With the following settings:
learning rate: 1e-6
lr_scheduler: polynomial
max_train_steps: 3000
batch_size: 3
lr_warmup_steps: 200
use_8bit_adam and gradient_checkpointing is False
can anyone please help me? even with such big GPU RAM, how can I get Cuda OOM error? 😅
Hey @geekyayush,
Can you try simply reducing the batch size?
Thanks @patrickvonplaten Reducing the batch size to 1 worked.
If you don't mind, could you please tell me about batch_size? I am relatively new to ML and have been trying to follow some tutorials. but could not find anything on batch_size.
Maybe you can share some documentation links to batch_size?
I would really appreciate it. Thanks again @patrickvonplaten
https://www.google.com/search?q=batch+size+in+machine+learning&oq=batch+size+in+machine+learning&aqs=chrome..69i57.2817j0j1&sourceid=chrome&ie=UTF-8
maybe helps?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.