interactdiffusion
interactdiffusion copied to clipboard
OUT OF MEMORY
Why does my training run out of memory when the batch size is set to 4, and out of memory when the batch size is set to 2 during multi-GPU training, yet the paper is able to set it to 8? I'm using the same device as the one mentioned in the paper, which is a 4090, and the cpkt is SD 1.4 and interact-diffusion-v1-1.pth. Thank you!!
We use this command for training:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt <existing_gligen_checkpoint> --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name <existing SD v1.4/v1.5 checkpoint>
We use AMP, batch size is set to 4 for each GPU.