MNISTDiffusion clip vs no-clip sampling

Hi, Thank you for this wonderful repo.

It seems like the clip version of the sampling works better. Can you explain what is the used sampling algorithm? no-clip seem like the original sampling algorithm

Thanks

Nov 19 '23 12:11 ariel415el

Hi, @ariel415el

Thanks for your attention! Clipping the x_0 yields better results, it's because we know that x_0 has the [-1,1] distribution(x_0 is the normed input image, we norm the image to [-1,1] during the training process), and then we use the clipped x_0 to compute our mean and std for every timestep instead of only using the x_t. And I believe the original DDPM implementation also uses the clipped x_0 trick.

All these two methods come from the theory of the reverse diffusion process. Here is what you can find in Lilian Weng's Blog:

We know how to compute x_0:

$$\mathbf{x}_0 = \frac{1}{\sqrt{\bar{\alpha}_t}}(\mathbf{x}_t - \sqrt{1 - \bar{\alpha}_t}\boldsymbol{\epsilon}_t)$$

And we compute mean as ( use x_0, so we can clip it !):
Or(not use x_0, no clipped x_0):

In conclusion, the clipped x_0 sampling method yields better results, it's because we have more stable values, and it's a free trick! If we use more advanced training techniques, clipping the x_0 will not be required.

If you have any other questions, please let me know!

Nov 20 '23 02:11 bot66

Thank you for your answer, From what I get the clipping trick is simply to clip x_0 to [-1,1] whenever you use it in the sampling process?

Dec 11 '23 08:12 ariel415el

@ariel415el Yes, the code is here. https://github.com/bot66/MNISTDiffusion/blob/c7ba8e09174cbb88b9cc314db3bf2e514668681c/model.py#L102

Dec 11 '23 08:12 bot66