MetaApprox much higher accuracy for perturbed graph
Hi, Thanks for sharing the DeepRobust code.
I am testing the mettack.py code using test_mettack.py. When the model is A-Meta-Self, the attacked graph accuracy is much higher than Meta-Self model. For cora dataset with ptb_rate 0.2, the attacked graph accuracy with Meta-Self model is 0.4834 while the A-Meta-Self model output a much higher accuracy 0.7596.
A-Meta-Self model has similar problems for other ptb_rate. Does it need special parameter setting? What is your test results for A-Meta-Self?
Hi, thanks for your interest in our repository!
I am not sure about the hyper-parameters of A-Meta-Self. In the paper, they only show the performance of A-Meta-Train and A-Meta-Both. I just tried A-Meta-Trainon Cora with 0.2 ptb_rate and we can obtain a lower accuracy 0.6725. In my view, the approximation of meta gradient is actually pretty aggressive because we discard the whole training trajectory of the inner problem.
Thanks for your reply. I got a similar result for A-Meta_Train. Now I am just a little confused:
-
Why the attack result of
Meta-Selfis much better thanMeta-Trainwhile the result ofA-Meta-Selfis much worse thanA-Meta-Train? -
In line 492 of
MetaApprox()::inner_train(),self.adj_grad_sum += torch.autograd.grad(attack_loss, self.adj_changes, retain_graph=True)[0], why do you take the derivative ofattack_lossw.r.t.self.adj_changesinstead ofmodified_adj? It seems the mettack paper takes the derivative w.r.t. the current adjacency matrix, which is like themodified_adjin your code. The result is different when usingmodified_adj.
Hi,
- I am not very sure about the reason behind the phenomenon. It could be that Meta-Self involves more label information and when we approximate the training trajectory, A-Meta-Self simplifies too much to estimate the gradient direction.
- We followed the authors' tensorflow implementation. See here. It should be fine to directly calculate the gradient of A as it is the same as gradient of ΔA. But we have a symmetrization operation on ΔA before we perform ΔA+A. See https://github.com/DSE-MSU/DeepRobust/blob/2a52969fb8b881ac5325a8d0a26a6880aa8b6a9b/deeprobust/graph/global_attack/mettack.py#L70-L74
Yes, theoretically the gradient w.r.t. A or ΔA should be the same.
As for the symmetrization code, I compared adj_changes_symm and self.adj_changes rightly before here and found no difference. I compared them with a condition sentence if (adj_changes_symm != self.adj_changes).sum().item() > 0:.
I also compared the test results before and after changing here to modified_adj = self.adj_changes + ori_adj for both model Self and model A-Self at ptb_rate 0.2. For model Self the results are slight different. For A-Self the results are exactly the same. If the gradient w.r.t. A or ΔA are the same, the test results should be exactly the same?