tanh while calculating attention scores

Open gordicaleksa opened this issue 5 years ago • 1 comments

Hey! I was interested into why are you using tanh here:

attn_src = torch.matmul(F.tanh(h_prime), self.a_src) # bs x n_head x n x 1

in BatchMultiHeadGraphAttention, get_layers.py. Did it stabilize the training? Is it some form of feature normalization?

Jan 24 '21 15:01 gordicaleksa

Thanks for pointing our this.

Yes, in the original GAT paper, they don't have the tanh activation. But we found that it helps our training a little.

Jan 28 '21 06:01 xptree