DeepRobust MetaApprox much higher accuracy for perturbed graph

Hi, Thanks for sharing the DeepRobust code.

I am testing the mettack.py code using test_mettack.py. When the model is A-Meta-Self, the attacked graph accuracy is much higher than Meta-Self model. For cora dataset with ptb_rate 0.2, the attacked graph accuracy with Meta-Self model is 0.4834 while the A-Meta-Self model output a much higher accuracy 0.7596.

A-Meta-Self model has similar problems for other ptb_rate. Does it need special parameter setting? What is your test results for A-Meta-Self?

Jul 27 '21 19:07 cxw-droid

Hi, thanks for your interest in our repository!

I am not sure about the hyper-parameters of A-Meta-Self. In the paper, they only show the performance of A-Meta-Train and A-Meta-Both. I just tried A-Meta-Trainon Cora with 0.2 ptb_rate and we can obtain a lower accuracy 0.6725. In my view, the approximation of meta gradient is actually pretty aggressive because we discard the whole training trajectory of the inner problem.

Jul 27 '21 21:07 ChandlerBang

Thanks for your reply. I got a similar result for A-Meta_Train. Now I am just a little confused:

Why the attack result of Meta-Self is much better than Meta-Train while the result of A-Meta-Self is much worse than A-Meta-Train?
In line 492 of MetaApprox()::inner_train(), self.adj_grad_sum += torch.autograd.grad(attack_loss, self.adj_changes, retain_graph=True)[0], why do you take the derivative of attack_loss w.r.t. self.adj_changes instead of modified_adj? It seems the mettack paper takes the derivative w.r.t. the current adjacency matrix, which is like the modified_adj in your code. The result is different when using modified_adj.

Jul 28 '21 23:07 cxw-droid

Hi,

I am not very sure about the reason behind the phenomenon. It could be that Meta-Self involves more label information and when we approximate the training trajectory, A-Meta-Self simplifies too much to estimate the gradient direction.
We followed the authors' tensorflow implementation. See here. It should be fine to directly calculate the gradient of A as it is the same as gradient of ΔA. But we have a symmetrization operation on ΔA before we perform ΔA+A. See https://github.com/DSE-MSU/DeepRobust/blob/2a52969fb8b881ac5325a8d0a26a6880aa8b6a9b/deeprobust/graph/global_attack/mettack.py#L70-L74

Jul 28 '21 23:07 ChandlerBang

Yes, theoretically the gradient w.r.t. A or ΔA should be the same.

As for the symmetrization code, I compared adj_changes_symm and self.adj_changes rightly before here and found no difference. I compared them with a condition sentence if (adj_changes_symm != self.adj_changes).sum().item() > 0:.

I also compared the test results before and after changing here to modified_adj = self.adj_changes + ori_adj for both model Self and model A-Self at ptb_rate 0.2. For model Self the results are slight different. For A-Self the results are exactly the same. If the gradient w.r.t. A or ΔA are the same, the test results should be exactly the same?

Jul 29 '21 23:07 cxw-droid