hyper-parameter learning
Hi Kipf, I would like to know how you did hyper-parameter search. Would be helpful for applying this code to other datasets.
I did a very small scale grid search around typical values for learning rate, dropout and hidden layer on a validation set.
On Sat 15. Dec 2018 at 15:53 Sheikh Nasrullah [email protected] wrote:
Hi Kipf, I would like to know how you did hyper-parameter search. Would be helpful for applying this code to other datasets.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tkipf/gae/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYJ9O6XMbj8k4e53gXOORMZO1uM6zks5u5QzdgaJpZM4ZUxDo .
thanks, Kipf for the answer. I have another trouble. I have wiki dataset, and gcn_ae model is working perfectly fine. But when I use the gcn_vae model on the same dataset, it shoots me with the error. I don't understand what causes this error. Please, can you help. The message is here:
File "train.py", line 223, in
Looks like you have a nan or inf value somewhere in the model, e.g. in the variable sample_weight. You should be able to find the problem with a debugger.
I checked with the dataset. There is no problem. As i mentioned earlier, GCN_AE model works fine on this dataset, GCN_VAE model throws this error. Actually the Embedding generated is full of Nan values. What is possibly going wrong?.
Looks like the loss might become inf or nan at some point, then you get nan-valued gradients. This might come from the KL term in the VAE loss. You can try setting this term to zero (or removing it explicitly) and see if this causes the issue.
On Tue 8. Jan 2019 at 17:37 Sheikh Nasrullah [email protected] wrote:
I checked with the dataset. There is no problem. As i mentioned earlier, GCN_AE model works fine on this dataset, GCN_VAE model throws this error. Actually the Embedding generated is full of Nan values. What is possibly going wrong?.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/tkipf/gae/issues/22#issuecomment-452365087, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYGCOCJ9ZLzG-3pGaryC5UdbN_dz0ks5vBMlDgaJpZM4ZUxDo .
Thanks kipf for the suggestion and your prompt responses. Removing the only KL term does not solve it, but have to remove this as well self.z = self.z_mean + tf.random_normal([self.n_samples, FLAGS.hidden2]) * tf.exp(self.z_log_std) from model file. The common link between KL divergence and the above term is tf.exp(self.z_log_std). So removing this term from the both, the model works. Correct me if I am wrong, By removing KL, does not VAE model reduces to GAE model. Now, at this point, I can't figure out what is going wrong. Please provide some suggestions to handle this
thanks again. More info: on the wiki dataset, AE model works best with these parameters --model gcn_vae --learning_rate 0.0001 --epochs 50 --hidden1 500 --hidden2 128
Looks like the variance (tf.exp(self.z_log_std)) is diverging for some reason. Not sure why.
You can try our follow-up model, which should be more stable: https://nicola-decao.github.io/s-vae
Thanks, Kipf for your help. I will use the follow-up model.