LingxiaoShawn
LingxiaoShawn
From math perspective BatchNorm is normalizing all node embeddings in a batch to be Gaussian(0,1) distribution by subtracting the row-wise mean and dividing the row-wise standard deviation. In contrast, PairNorm...
Sorry for my late reply. The underlying problem is that Pytorch updated its package after 1.13 (I remember) and raised the bar of in-place operation: some in-place operation must be...
Good question, we empirically found scale-and-center works good, but intuitively the explanation is not as good as PN. In fact, I was designing scale-and-center at the beginning to **approximately** achieve...