DLF about the prior distribution

In the top layer, the prior distribution here use h=conv(0) + embedding as the mean and std in the case of 'ycond=True'. It seems that the conv layer is unnecessary.

Jul 23 '19 02:07 yuffon

In the top prior layer, the mean and logs are shared in spacial dimension in case of non-conditioning (ycond=False), meaning (mean, logs) = tensor(1, 1, 1, 2n). In the implementation, we set (mean, logs)=bias of that conv layer, and let the conv(0) broadcasts the bias to shape (batch_size, height, width, 2n). Nothing more than that. So you can replace it with (mean, logs)= tf.get_variable([1,1,1, 2*n]) and tf.tile() to get the right shape.

So, it's just a programming trick.

Jul 23 '19 15:07 naturomics

I see that the code uses rescale = tf.get_variable("rescale", [], initializer=tf.constant_initializer(1.)) scale_shift = tf.get_variable("scale_shift", [], initializer=tf.constant_initializer(0.)) logsd = tf.tanh(logsd) * rescale + scale_shift for mean and logstd. Is that necessary?

Jul 24 '19 03:07 yuffon

It's for training stability, see the experiments section in our paper.

Jul 24 '19 06:07 naturomics