stablediffusion icon indicating copy to clipboard operation
stablediffusion copied to clipboard

Running out of 24GB memory

Open rubenhak opened this issue 3 years ago • 0 comments

I'm trying to run SD 2.1 on a GPU with 24GB memory. getting the out-of-memory error below.

$ python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

ERROR:

Global seed set to 42
Loading model from v2-1_768-ema-pruned.ckpt
Global Step: 110000
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                                                                                                                                 | 0/3 [00:00<?, ?it/sData shape for DDIM sampling is (3, 4, 96, 96), eta 0.0                                                                                                                                                          | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler:   0%|                                                                                                                                                                                            | 0/50 [00:03<?, ?it/s]
data:   0%|                                                                                                                                                                                                     | 0/1 [00:04<?, ?it/s]
Sampling:   0%|                                                                                                                                                                                                 | 0/3 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "scripts/txt2img.py", line 289, in <module>
    main(opt)
  File "scripts/txt2img.py", line 248, in main
    samples, _ = sampler.sample(S=opt.steps,
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/models/diffusion/ddim.py", line 103, in sample
    samples, intermediates = self.ddim_sampling(conditioning, size,
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/models/diffusion/ddim.py", line 163, in ddim_sampling
    outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/models/diffusion/ddim.py", line 211, in p_sample_ddim
    model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/models/diffusion/ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/attention.py", line 334, in forward
    x = block(x, context=context[i])
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/attention.py", line 269, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/diffusionmodules/util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/diffusionmodules/util.py", line 129, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/attention.py", line 272, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "/opt/conda/envs/ldm-v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Stability-AI-stablediffusion.git/ldm/modules/attention.py", line 177, in forward
    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 9.49 GiB (GPU 0; 22.06 GiB total capacity; 14.72 GiB already allocated; 5.68 GiB free; 14.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

rubenhak avatar Jan 04 '23 09:01 rubenhak