stablediffusion icon indicating copy to clipboard operation
stablediffusion copied to clipboard

Program stuck on "Sampling"

Open s-sangwon opened this issue 2 years ago • 3 comments

(base) C:\Users\GTX73\stablediffusion>python scripts/txt2img.py --prompt "our galaxy itself contains a hundred billion stars" --ckpt C:\Users\GTX73\Downloads\768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 --n_samples 1 Global seed set to 42 Loading model from C:\Users\GTX73\Downloads\768-v-ema.ckpt Global Step: 140000 LatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)... Sampling: 0%| | 0/3 [00:00<?, ?it/s]Data shape for DDIM sampling is (1, 4, 96, 96), eta 0.0 Running DDIM Sampling with 50 timesteps | 0/1 [00:00<?, ?it/s]

DDIM Sampler: 0%| | 0/50 [00:00<?, ?it/s] image

How do you solve this problem?

s-sangwon avatar May 16 '23 04:05 s-sangwon

me either. I also just typed sample code

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/v2-1_512-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference.yaml

and

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

and then get same result as above the pic

hotelbread avatar May 16 '23 06:05 hotelbread

I also have the same problem

ZXStudio avatar May 16 '23 08:05 ZXStudio

I guess the code currently does not support v-sampling: https://github.com/Stability-AI/stablediffusion/blob/main/ldm/models/diffusion/ddpm.py#L920

IceClear avatar Jun 08 '23 16:06 IceClear