AANE_Python icon indicating copy to clipboard operation
AANE_Python copied to clipboard

I got problems about the BlogCatalog dataset

Open Tomposon opened this issue 6 years ago • 9 comments

I try to run the Runme.py, in which the BlogCatalog set is trained. But when I used the embedding for node classification, the performance was terrible. The micro f1 was around 0.2, Why?

Tomposon avatar Mar 28 '19 12:03 Tomposon

Thanks for your interest. Did you make the "Indices" in your evaluation be consistent with the one in the embedding learning?

Thanks.

xhuang31 avatar Mar 28 '19 14:03 xhuang31

my label file follow the id of node. From 0 to... what the order does the Embedding.mat file follow in your source code?

Tomposon avatar Mar 28 '19 14:03 Tomposon

CombG = G[Group1+Group2, :][:, Group1+Group2] The order in the Embedding.mat follows the "Group1+Group2".

It is for evaluation. Sorry for the confusion. I just directly release the code in my evaluation. I will update it when I get time.

Thanks.

xhuang31 avatar Mar 28 '19 14:03 xhuang31

Thank you very much.I have another question. In your source code, dose the all network data used to train rather than “remove the edges between train data and test dat” mentioned in your paper.

Tomposon avatar Mar 28 '19 15:03 Tomposon

Yes. CombG = G[Group1+Group2, :][:, Group1+Group2]

We use the whole network to train Embedding.mat. After getting Embedding.mat, you could do cross validation on it.

Thanks.

xhuang31 avatar Mar 28 '19 15:03 xhuang31

I made the "Indices" in my evaluation be consistent with the one in the embedding learning, but the performance of Flickr dataset was lower than the paper. I just used the default parameters in your implementation. @xhuang31

Tomposon avatar Mar 29 '19 01:03 Tomposon

How about the BlogCatalog. I use the SVM in Matlab to perform the classification in my papers.

As long as you use the same classifier, you will get similar results for AANE and baselines. They may become worser together, but relatively AANE would outperforms baselines in general.

Thanks.

xhuang31 avatar Mar 29 '19 02:03 xhuang31

I also used linear svm, i used 30% of BlogCatalog nodes to train classifier, and the micro-f1 is around 0.82. @xhuang31 Thank for your attention.

Tomposon avatar Mar 29 '19 02:03 Tomposon

It is five-fold cross-validation. Should be 80% of data for training. Plz check the paper. Thanks.

xhuang31 avatar Mar 29 '19 02:03 xhuang31