Is it good to train on 512x512, inference on 512x768?
Thank you for your great job. It is really useful for me. I want to inference with resolution of 512x768. I know I could do that with the model trained on 512x512. But to get the best performance on both 512x512 and 512x768, should I trained on 512x512 or 512x768 or 768x768. I'd appreciate your advise.
In our experiments, if there is a resolution discrepancy, then the performance usually degrades quite a bit i.e., if you fine-tune on a custom dataset with a different resolution, then the quality of the generated images might not be on par. Usually, the number of training images seen during fine-tuning dictates this performance.
If an upscaled resolution is a requirement for you, would mind trying out the latent upscaler model we recently introduced? You can find an application of it in this Space: https://huggingface.co/spaces/huggingface-projects/stable-diffusion-latent-upscaler/blob/main/app.py.
Cc: @yiyixuxu
@sayakpaul ,thanks so much for your reply. So if I want to finetune the sd-1.5 with 200k images, and most of the image is resolution of 512x768. In accordance with what you said, to get the best performance on 512x768, maybe training on 512x768 is better? BTW, thanks for your latent upscaler model, I will try it later.
Yes, sure, worth giving it a try but I just wanted to share our experience to make you aware of the poor results that might arise :)
@sayakpaul
I try the app of upscaler, it is not good yet, maybe worse than real-esrgan. Did the result I show is right? prompt="(portrait:1.0),face in the center, Mage godess with white hair and mage god with black hair, pale skin, fantasy, in love, couple, hug each other, sharp focus, intricate, elegant, illustration, ambient lighting, art by stefanie law, qistina khalidah, tranding on artstation, art by luis royo higly detailed studio lighting"
It's happening probably because of the discrepancy between the training data. So, I guess your best bet for now is to fine-tune the model or use something like MultiDiffusion. Cc: @omerbt
@sayakpaul Thanks so much.
Hi! indeed as @sayakpaul mentioned, even though StableDiffusion can technically process higher resolution images, we observed that many times it produces poor quality outputs (it is out-of-distribution w.r.t its training data).MultiDiffusion tackles this and allows to generate high-quality images at arbitrary aspect ratio. See this documentaion for how to use it through diffusers.
@tengshaofeng,
You can also definitely try out to just directly generate larger images by setting height and width - this works quite well sometimes. See: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.call.height
@omerbt @patrickvonplaten thanks for your reply,guys. I learned so much. Thanks again.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.