Prince
Prince
According to the original VAE paper[1], BCE is used because the decoder is implemented by MLP+Sigmoid which can be viewed as a 'Bernoulli distribution'. You can use MSE if you...
Yes, that is the idea. However, I don't think tanh would be an appropriate choice for the output layer in this case, as we are fitting values in [0,1] rather...
I think it would not be a proper probability distribution if the flow is not invertible, as you need the determinant of inverse jacobian to transform a random variable.