DeepInf
DeepInf copied to clipboard
tanh while calculating attention scores
Hey! I was interested into why are you using tanh here:
attn_src = torch.matmul(F.tanh(h_prime), self.a_src) # bs x n_head x n x 1
in BatchMultiHeadGraphAttention, get_layers.py. Did it stabilize the training? Is it some form of feature normalization?
Thanks for pointing our this.
Yes, in the original GAT paper, they don't have the tanh activation. But we found that it helps our training a little.