diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Super Resolution Diffusion Model

Open dvIdol opened this issue 3 years ago • 21 comments

Hello.

I am very interested in the unconditional image generation pipelines. Like in this example here: https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation

I have trained a network that is 128x128 and it gives very good results for what I need. However the resolution is very low.

The main diffusers readme mentions a super resolution diffusion model that comes after the low resolution model. How do I make this model? There are no examples and everything seems to be turning to stable diffusion. Is there any guide for how to train a low to high resolution diffusion model?

Thank you for making such a library, it is very good.

dvIdol avatar Sep 10 '22 18:09 dvIdol

Related to https://github.com/huggingface/diffusers/issues/146

patrickvonplaten avatar Sep 13 '22 16:09 patrickvonplaten

@patil-suraj IIRC you had plans for an SR example too? I might not have bandwidth in the next few weeks, but can work on SR after, if it's not high on your list.

anton-l avatar Oct 27 '22 12:10 anton-l

Yes, that's on my todo list. But if anyone is interested feel free to open a PR, happy to help :)

patil-suraj avatar Oct 27 '22 13:10 patil-suraj

I'm interested in this too, and it's becoming relevant for the on-going fast.ai course. I might have some time to start working on this in a few days and/or help @anton-l and @patil-suraj when they do :)

pcuenca avatar Oct 27 '22 14:10 pcuenca

That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.

patil-suraj avatar Oct 28 '22 09:10 patil-suraj

My thought exactly :)

pcuenca avatar Oct 28 '22 20:10 pcuenca

@patil-suraj @pcuenca I can spend time implementing SR example this weekend (PyTorch & Flax).

duongna21 avatar Oct 31 '22 07:10 duongna21

Reopening this issue as it's related to training super-res model.

patil-suraj avatar Nov 09 '22 13:11 patil-suraj

Also, thanks to @duongna21 a super resolution model is now available in diffusers

from diffusers import LDMSuperResolutionPipeline
from PIL import Image

pipe = LDMSuperResolutionPipeline.from_pretrained('CompVis/ldm-super-resolution-4x-openimages')
pipe.to('cuda')

img = Image.open('low_resolution.jpg')
super_img = pipe(img, num_inference_steps=100, eta=1)
super_img['images'][0]

patil-suraj avatar Nov 09 '22 13:11 patil-suraj

@patil-suraj Hi, How is it going? There's an unofficial repo with much attention: https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement

That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.

ElliotQi avatar Nov 15 '22 05:11 ElliotQi

Haven't really started anything yet, thanks for sharing the repo.

patil-suraj avatar Nov 15 '22 10:11 patil-suraj

Hi! I am interested in using SR3 for the work on my master's thesis, and would also love to contribute to the implementation!

chibi-dogs avatar Feb 03 '23 10:02 chibi-dogs

I also wanted to share the repo to OpenAI's guided diffusion: Guided Diffusion. SR3 uses the improved version of DDPM as proposed by OpenAI in the linked repo. I think you might also find this useful for the implementation of SR3 or even it's follow up model, Palette. Here is a link to the paper that introduced Palette: Image-to-Image Diffusion Models and the authors' website

chibi-dogs avatar Feb 03 '23 10:02 chibi-dogs

@basab-gupta, if you are interested, feel free to start working on it; happy to help with the PR :)

We can add this example under examples/research_projects directory.

patil-suraj avatar Feb 03 '23 13:02 patil-suraj

@patil-suraj Thank you! Do you mean add a link to the guided diffusion repo to examples/research_projects?

I'll try to get started with the implementation. Also, feel free to HMU in case anyone else is interested on working on this together :)

chibi-dogs avatar Feb 03 '23 14:02 chibi-dogs

I meant to add a training script leveraging diffusers.

patil-suraj avatar Feb 03 '23 14:02 patil-suraj

I will join you on this script @basab-gupta !

marc-gav avatar Feb 04 '23 11:02 marc-gav

Hi! @patil-suraj, Marc @marc-gav and I have a small update for you. We managed to set a training script up. However, the loss plateaus out after a point when we run the training. We were thinking of adding a few modifications from Improved Denoising Diffusion Probabilistic Models and were wondering what you thought of them?

  1. Add a cosine scheduler in the DDPM Scheduler
  2. Letting the model learn the variance and potentially using the hybrid loss
  3. Improvements in the architecture from OpenAI's guided diffusion repo.

Also open to any other suggestions that could help us potentially fix this issue.

chibi-dogs avatar Feb 08 '23 08:02 chibi-dogs

Hi @patil-suraj. We have an update for you. We managed to fix our problem with the loss from our previous post. We now have a working implementation of the SR3 model that uses the HF diffusers. Here are some preliminary results from our experiments. Preliminary Results of 8x super resolution

The results however, still do not look quite as good. We are currently working on tuning the hyperparameters to optimize the results and will hopefully get back to you soon with more positive updates :)

chibi-dogs avatar Feb 16 '23 15:02 chibi-dogs

Very cool!

patrickvonplaten avatar Feb 16 '23 18:02 patrickvonplaten

@patrickvonplaten Danke :)

chibi-dogs avatar Feb 16 '23 20:02 chibi-dogs

Hi @basab-gupta @marc-gav ! Thanks for your contribution. There's a problem about sr3: As is shown in Fig 12. of the paper, SR3 (Other Google diffusion models) uses noise level sampling during training, it enables the use of different noise schedules during the inference. But I always get noisy output with less testing timestep than training timestep. Did you do some experiments about the different inference timesteps?

ElliotQi avatar Feb 23 '23 07:02 ElliotQi

Hi @ElliotQi! We are still working on the inference script to make sure that it allows us to vary the noise schedule and number of steps separately from the ones used in training. Unfortunately, because these models take a while to train, our progress has been a bit slow.

Regarding your question, have you tried adjusting the values of $\beta_{0}, \beta_{N}$, and $N$ during inference? To my understanding, the authors of SR3 fix $N$ (the number of reverse steps) at 100 and then do a hyperparameter sweep to find the best combination of beta values. They use FID scores of their validation dataset to optimize the hyperparameters. We will let you know once we've made some progress on our inference script.

The authors of SR3 use the noise conditioning described in the Wavegrad paper which is another diffusion model published by the Google Brain team and is used for voice synthesis. I came across this useful repository that has a script to tune the Wavegrad model to find the best inference schedule. Maybe you could take a look at that? Alternatively, I believe you could also use something like Optuna or WanDB to do the hyperparameter tuning for you.

chibi-dogs avatar Feb 23 '23 12:02 chibi-dogs

@basab-gupta Danke! :) I tested several values of beta, but no one got good results. I'm still trying some hyperparameters for better performance, thanks for sharing the repo of Wavegrad. In fact, I noticed that Deblurring paper used this continuous noise schedule to achieve Perception-Distortion trade-off. It's so amazing that I'm tuning my model to reproduce this. Thanks for your advice~

ElliotQi avatar Feb 23 '23 14:02 ElliotQi