AANE_Python I got problems about the BlogCatalog dataset

I try to run the Runme.py, in which the BlogCatalog set is trained. But when I used the embedding for node classification, the performance was terrible. The micro f1 was around 0.2, Why?

Mar 28 '19 12:03 Tomposon

Thanks for your interest. Did you make the "Indices" in your evaluation be consistent with the one in the embedding learning?

Thanks.

Mar 28 '19 14:03 xhuang31

my label file follow the id of node. From 0 to... what the order does the Embedding.mat file follow in your source code?

Mar 28 '19 14:03 Tomposon

CombG = G[Group1+Group2, :][:, Group1+Group2] The order in the Embedding.mat follows the "Group1+Group2".

It is for evaluation. Sorry for the confusion. I just directly release the code in my evaluation. I will update it when I get time.

Thanks.

Mar 28 '19 14:03 xhuang31

Thank you very much.I have another question. In your source code, dose the all network data used to train rather than “remove the edges between train data and test dat” mentioned in your paper.

Mar 28 '19 15:03 Tomposon

Yes. CombG = G[Group1+Group2, :][:, Group1+Group2]

We use the whole network to train Embedding.mat. After getting Embedding.mat, you could do cross validation on it.

Thanks.

Mar 28 '19 15:03 xhuang31

I made the "Indices" in my evaluation be consistent with the one in the embedding learning, but the performance of Flickr dataset was lower than the paper. I just used the default parameters in your implementation. @xhuang31

Mar 29 '19 01:03 Tomposon

How about the BlogCatalog. I use the SVM in Matlab to perform the classification in my papers.

As long as you use the same classifier, you will get similar results for AANE and baselines. They may become worser together, but relatively AANE would outperforms baselines in general.

Thanks.

Mar 29 '19 02:03 xhuang31

I also used linear svm, i used 30% of BlogCatalog nodes to train classifier, and the micro-f1 is around 0.82. @xhuang31 Thank for your attention.

Mar 29 '19 02:03 Tomposon

It is five-fold cross-validation. Should be 80% of data for training. Plz check the paper. Thanks.

Mar 29 '19 02:03 xhuang31