diffusion-posterior-sampling Paper & implementation differences

Hi, There are a few differences between the paper and this repository and it will be wonderful if you could clarify for me the reasons behind them:

The reported gaussain-noisy experiments in the paper use sigma_y=0.05, and indeed in the config files config['noise']['sigma']=0.05. But while the images are stretchered from [0,1] to [-1,1], the sigma is unchanged – meaning that in practice the noise added is with std sigma/2, i.e. y_n is cleaner compared to the reported settings in the paper. This can be easily checked by computing torch.std(y-yn) after the creation of y and y_n in sample_condition.py.
The paper defines the step-size scalar as a constant divided by the norm of the gradient (Appendix C.2), meaning that we always normalize the gradient before scaling it. In the code, the constant is defined in config['conditioning']['params']['scale'] and used in PosteriorSampling.conditioning() to scale the gradient, but we never normalized the gradient in the first place (in PosteriorSampling.grad_and_value() for example). By adding the gradient normalization the method seems to break.
For the gaussian FFHQ-SRx4 case, Appendix D.1 defines the scale as 1.0, but configs/super_resolution_config.yaml uses 0.3.

Thank you for your time and effort!

Feb 05 '23 09:02 man-sean

For (2), I think the authors apply the normalization factor before taking the gradient. If you look at ConditioningMethod.grad_and_value (here), they take the gradient of the norm, not the norm squared.

I believe there's another difference between Alg. 1 of the paper and the code. In EpsilonXMeanProcessor.predict_xstart (here), the coefficient applied to the score-model output is different from the coefficient in line 4 of Alg. 1. In the paper, the coefficient is $(1-\bar{\alpha}_i)/\sqrt{\bar{\alpha}_i}$, but in the code, it is $-1/\sqrt{\bar{\alpha}_i-1}$.

Feb 08 '23 20:02 berthyf96

@berthyf96, for your second point regarding "EpsilonXMeanProcessor.predict_xstart", I also did not understand the difference until I realized that the score function $\widehat{s}(x_t)$ associated with a noise predictor $\epsilon_\theta(x_t)$ is: $$\widehat{s}(x_t) = \nabla_{x_t} \log p_\theta(x_t) = - \frac{1}{\sqrt{1-\bar{\alpha_t}}} \epsilon_\theta(x_t) $$ See Equation (11) here. Injecting this result into the expression of $\widehat{x}_0$ of Alg 1 gives the implemented results.

Mar 02 '23 08:03 claroche-r

@claroche-r thanks so much for clarifying that!

Mar 02 '23 20:03 berthyf96

thank you!

Apr 27 '24 04:04 Mally-cj