Question about SNGP differences from the paper

Open jeffwillette opened this issue 4 years ago • 1 comments

Hi. Thanks for keeping everything updated here. I noticed there are some differences between the SNGP implementation here and what is described in the paper which leaves a few questions I am curious about:

The default settings here: https://github.com/google/uncertainty-baselines/blob/main/baselines/cifar/sngp.py#L155 place the ridge penalty at 1 instead of 1e-3 as mentioned in table 5 of the paper. Is this a result of using the Gaussian likelihood instead of the logistic?
In conjunction with the point above, there seems to be an extra multiplication which is not included in the paper equations in the predictive variance (https://github.com/google/edward2/blob/main/edward2/tensorflow/layers/random_feature.py#L456) which multiplies by the ridge penalty again after inversion. Was this used in the original experiments? How is this justified?
Regarding this comment: https://github.com/google/uncertainty-baselines/issues/258, it says that the current code used the Gaussian likelihood for simplification, but in the paper it seems to follow a one vs all logistic regression. Was a one-vs-all logistic regression used in the original training? or was it always a softmax even though the likelihood is logistic, or something else entirely?

Dec 03 '21 05:12 jeffwillette

Sorry for bothering. I noticed that in the SNGP paper, there are K precision matrices of size (B, B). However in the code there is only one. Is this corresponding to your third question? I'm new in uncertainty study, and this confused me about how to use the cov matrix.

Jan 10 '22 08:01 Phoveran