Paper & implementation differences
Hi, There are a few differences between the paper and this repository and it will be wonderful if you could clarify for me the reasons behind them:
- The reported gaussain-noisy experiments in the paper use
sigma_y=0.05, and indeed in the config filesconfig['noise']['sigma']=0.05. But while the images are stretchered from [0,1] to [-1,1], the sigma is unchanged – meaning that in practice the noise added is with stdsigma/2, i.e.y_nis cleaner compared to the reported settings in the paper. This can be easily checked by computingtorch.std(y-yn)after the creation ofyandy_ninsample_condition.py. - The paper defines the step-size scalar as a constant divided by the norm of the gradient (Appendix C.2), meaning that we always normalize the gradient before scaling it.
In the code, the constant is defined in
config['conditioning']['params']['scale']and used inPosteriorSampling.conditioning()to scale the gradient, but we never normalized the gradient in the first place (inPosteriorSampling.grad_and_value()for example). By adding the gradient normalization the method seems to break. - For the gaussian FFHQ-SRx4 case, Appendix D.1 defines the scale as 1.0, but
configs/super_resolution_config.yamluses 0.3.
Thank you for your time and effort!
For (2), I think the authors apply the normalization factor before taking the gradient. If you look at ConditioningMethod.grad_and_value (here), they take the gradient of the norm, not the norm squared.
I believe there's another difference between Alg. 1 of the paper and the code. In EpsilonXMeanProcessor.predict_xstart (here), the coefficient applied to the score-model output is different from the coefficient in line 4 of Alg. 1. In the paper, the coefficient is $(1-\bar{\alpha}_i)/\sqrt{\bar{\alpha}_i}$, but in the code, it is $-1/\sqrt{\bar{\alpha}_i-1}$.
@berthyf96, for your second point regarding "EpsilonXMeanProcessor.predict_xstart", I also did not understand the difference until I realized that the score function $\widehat{s}(x_t)$ associated with a noise predictor $\epsilon_\theta(x_t)$ is: $$\widehat{s}(x_t) = \nabla_{x_t} \log p_\theta(x_t) = - \frac{1}{\sqrt{1-\bar{\alpha_t}}} \epsilon_\theta(x_t) $$ See Equation (11) here. Injecting this result into the expression of $\widehat{x}_0$ of Alg 1 gives the implemented results.
@claroche-r thanks so much for clarifying that!
thank you!