hight res stable diffusion gives repeated images
Describe the bug
I have tried to run SD on 1024x1024 and 1664x1664x resolutions, but the result is that I get almost always repeated objects in the generated picture. Is there any simple way to repair it?
For example, results of: 1024x1024 generation is: https://ibb.co/4RKhS6y 1664x1664 resolution is: https://ibb.co/bLbmHtp
Reproduction
No response
Logs
No response
System Info
stable-diffusion 1-4
would the cause be because the model was trained with only 512x512?
im also getting the same issue btw. i just stick with 512 and upscale.
Hmm, I don't have too much experience with high res images yet
That is a commonly reported problem when creating highres images. One option is to create a low-resolution image first and then use img2img to create a high-resolution image.
I have not tried this yet, but this recently implemented Highres. fix feature in AUTOMATIC1111/sable-diffusion-webui may be more helpful.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/6d7ca54a1a9f448419acb31a54c5e28f3e4bcc4c#commitcomment-84506718
Sadly this problem even exists when using the web GUI version, Dream Studio by Stability AI, where you have to pay for your credits. They offer resolution up to 1024p but it's not much of use if some of the pictures get split...
I would guess this could be due to insufficient vram. supposedly you need 4 gigs for a 512x512 image. if that's the case for 1024x1024 it would be 16 gigs. I am thinking when the ram runs out it just reuses what is already stored in ram. I could be wrong though. maybe as the process iterates it begins to "see" objects and as it progresses continues to resolve them.
I think the way AUTOMATIC1111 currently processes high resolution images make a lot of sense (e.g. first doing low res and only at the final decoding steps upscaling the image and smoothing it with noise
I would guess this could be due to insufficient vram. supposedly you need 4 gigs for a 512x512 image. if that's the case for 1024x1024 it would be 16 gigs. I am thinking when the ram runs out it just reuses what is already stored in ram. I could be wrong though. maybe as the process iterates it begins to "see" objects and as it progresses continues to resolve them.
this has nothing to do with what the OP describes, and this is not how neural networks work, it's simply that the training data was 512x512 and the model fills the extra area with repetition of patterns which it knows. If you don't have the required VRAM, the program just won't work.
I could to generate 512x2048 images without this glitches , try "soccer team group photo" in the promt :) (my card 24 GB Nvidia)
I could to generate 512x2048 images without this glitches , try "soccer team group photo" in the promt :) (my card 24 GB Nvidia)
Well a "soccer team" is already a collection of soccer players, which is a repetition with small variations anyway. You can generate magnificent landscapes on hi-res because repeating patterns in landscapes don't look out of place or context
some how I could get away from it by adding (one person filling the whole view), I got better chances of actually getting only one person in the whole image, this only happens in landscape, as I think the Ai treat it as multiple portraits stuck together. If we could get a word that the Ai understand that means the whole view being generated and use it to explicitly say to it to use that view as "one" generation, it will help alot,
some how I could get away from it by adding (one person filling the whole view), I got better chances of actually getting only one person in the whole image, this only happens in landscape, as I think the Ai treat it as multiple portraits stuck together. If we could get a word that the Ai understand that means the whole view being generated and use it to explicitly say to it to use that view as "one" generation, it will help alot,
i think that needs training in high-res pictures. Which is... costly