diffusers Adding DreamBooth paper super resolution fine tuning

I'm currently experimenting with the dreambooth training script, and while the results are very impressive, I think it is missing the super resolution fine tuning phase?

In the dreambooth paper they fine tune a low resolution diffusion model to link a rare token with some instance images (with or without class loss), and then on top of it fine tune a super resoltuion model on the same instace images to improve the generation of fine details.

As far as I can see the current script only does the first stage. I'm aware that stable diffusion is based on latent diffusion so there will need to be some workaround for the encoding step.

I guess I was just wondering if that's something that would be worthwhile to work on. Would be happy to help with implementation as well

Nov 23 '22 09:11 alicranck

Hey @alicranck,

Yes that's very true. Note however that the super-resolution doesn't need to be strongly conditioned on the text inputs, it's usually a separate model that performs quite well even if not conditioned on text.

We are currently working on adding a nice latent super resolution model here: https://github.com/huggingface/diffusers/pull/1321 Once that's added we can look into potentially adding this to the dreambooth training!

Nov 29 '22 12:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 23 '22 15:12 github-actions[bot]