clip vs no-clip sampling
Hi, Thank you for this wonderful repo.
It seems like the clip version of the sampling works better. Can you explain what is the used sampling algorithm? no-clip seem like the original sampling algorithm
Thanks
Hi, @ariel415el
Thanks for your attention! Clipping the x_0 yields better results, it's because we know that x_0 has the [-1,1] distribution(x_0 is the normed input image, we norm the image to [-1,1] during the training process), and then we use the clipped x_0 to compute our mean and std for every timestep instead of only using the x_t. And I believe the original DDPM implementation also uses the clipped x_0 trick.
All these two methods come from the theory of the reverse diffusion process. Here is what you can find in Lilian Weng's Blog:
- We know how to compute
x_0:
$$\mathbf{x}_0 = \frac{1}{\sqrt{\bar{\alpha}_t}}(\mathbf{x}_t - \sqrt{1 - \bar{\alpha}_t}\boldsymbol{\epsilon}_t)$$
-
And we compute
meanas ( usex_0, so we can clip it !): -
Or(not use
x_0, no clippedx_0):
In conclusion, the clipped x_0 sampling method yields better results, it's because we have more stable values, and it's a free trick!
If we use more advanced training techniques, clipping the x_0 will not be required.
If you have any other questions, please let me know!
Thank you for your answer, From what I get the clipping trick is simply to clip x_0 to [-1,1] whenever you use it in the sampling process?
@ariel415el Yes, the code is here. https://github.com/bot66/MNISTDiffusion/blob/c7ba8e09174cbb88b9cc314db3bf2e514668681c/model.py#L102