diffusers hight res stable diffusion gives repeated images

Describe the bug

I have tried to run SD on 1024x1024 and 1664x1664x resolutions, but the result is that I get almost always repeated objects in the generated picture. Is there any simple way to repair it?

For example, results of: 1024x1024 generation is: https://ibb.co/4RKhS6y 1664x1664 resolution is: https://ibb.co/bLbmHtp

Reproduction

No response

Logs

No response

System Info

stable-diffusion 1-4

Sep 20 '22 19:09 batrlatom

would the cause be because the model was trained with only 512x512?

im also getting the same issue btw. i just stick with 512 and upscale.

Sep 21 '22 09:09 rav-en

Hmm, I don't have too much experience with high res images yet

Sep 21 '22 09:09 patrickvonplaten

That is a commonly reported problem when creating highres images. One option is to create a low-resolution image first and then use img2img to create a high-resolution image.

I have not tried this yet, but this recently implemented Highres. fix feature in AUTOMATIC1111/sable-diffusion-webui may be more helpful.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/6d7ca54a1a9f448419acb31a54c5e28f3e4bcc4c#commitcomment-84506718

Sep 21 '22 17:09 shirayu

Sadly this problem even exists when using the web GUI version, Dream Studio by Stability AI, where you have to pay for your credits. They offer resolution up to 1024p but it's not much of use if some of the pictures get split...

Oct 04 '22 11:10 fabmeyer

I would guess this could be due to insufficient vram. supposedly you need 4 gigs for a 512x512 image. if that's the case for 1024x1024 it would be 16 gigs. I am thinking when the ram runs out it just reuses what is already stored in ram. I could be wrong though. maybe as the process iterates it begins to "see" objects and as it progresses continues to resolve them.

Oct 12 '22 01:10 3PR

I think the way AUTOMATIC1111 currently processes high resolution images make a lot of sense (e.g. first doing low res and only at the final decoding steps upscaling the image and smoothing it with noise

Oct 14 '22 17:10 patrickvonplaten

I would guess this could be due to insufficient vram. supposedly you need 4 gigs for a 512x512 image. if that's the case for 1024x1024 it would be 16 gigs. I am thinking when the ram runs out it just reuses what is already stored in ram. I could be wrong though. maybe as the process iterates it begins to "see" objects and as it progresses continues to resolve them.

this has nothing to do with what the OP describes, and this is not how neural networks work, it's simply that the training data was 512x512 and the model fills the extra area with repetition of patterns which it knows. If you don't have the required VRAM, the program just won't work.

Oct 28 '22 10:10 goktug7913

I could to generate 512x2048 images without this glitches , try "soccer team group photo" in the promt :) (my card 24 GB Nvidia)

Mar 08 '23 20:03 Hangeshi

I could to generate 512x2048 images without this glitches , try "soccer team group photo" in the promt :) (my card 24 GB Nvidia)

Well a "soccer team" is already a collection of soccer players, which is a repetition with small variations anyway. You can generate magnificent landscapes on hi-res because repeating patterns in landscapes don't look out of place or context

Mar 09 '23 00:03 goktug7913

some how I could get away from it by adding (one person filling the whole view), I got better chances of actually getting only one person in the whole image, this only happens in landscape, as I think the Ai treat it as multiple portraits stuck together. If we could get a word that the Ai understand that means the whole view being generated and use it to explicitly say to it to use that view as "one" generation, it will help alot,

Apr 30 '23 14:04 FunnyFinger

some how I could get away from it by adding (one person filling the whole view), I got better chances of actually getting only one person in the whole image, this only happens in landscape, as I think the Ai treat it as multiple portraits stuck together. If we could get a word that the Ai understand that means the whole view being generated and use it to explicitly say to it to use that view as "one" generation, it will help alot,

i think that needs training in high-res pictures. Which is... costly

May 01 '23 11:05 goktug7913