The generative loss in implementation
In the paper, the objective function to minimize is
However in the code, objective first add this constant c, logpz, and then apply a negative sign to the objective to get generate loss
https://github.com/openai/glow/blob/eaff2177693a5d84a1cf8ae19e8e0441715b82f8/model.py#L172
https://github.com/openai/glow/blob/eaff2177693a5d84a1cf8ae19e8e0441715b82f8/model.py#L181
https://github.com/openai/glow/blob/eaff2177693a5d84a1cf8ae19e8e0441715b82f8/model.py#L184
It seems to minimize -logpx+Mlog(a), not the loss writed in paper which is -logpx-Mlog(a)
Do you ignore the constant because it will not affect the training or I missed something in the code?
It's an optimization:
log(a) = log(1 / n_bins) = -log(n_bins)
Just to clarify, the purpose of the constant "scaling penalty" c is just to ensure accurate likelihood computations? Since the minimum would be the same with or without c. Comparison or model selection on the basis of likelihood computation is also iffy though isn't it?
Giving that a normalizing flow gives you a correct log-likelihood of your data under your model it would be a shame to omit c even though technically not required for optimization. Model scoring/selection can be done using the log-likelihood of test data under the model. Superiority of a model can for example be proven with a likelihood-ratio test.
Thank you for the explanation!