Reproduce Table 4 in paper
Hi --
Can you point me to the code needed to reproduce the results in Table 4 of the paper. I ran
graphvite baseline deepwalk_youtube
which produced
----------- node classification ------------
effective labels: 50691 / 50767
macro-F1@1%: 0.310211
macro-F1@2%: 0.342265
macro-F1@3%: 0.352457
macro-F1@4%: 0.362396
macro-F1@5%: 0.367422
macro-F1@6%: 0.374697
macro-F1@7%: 0.376081
macro-F1@8%: 0.381525
macro-F1@9%: 0.381062
macro-F1@10%: 0.384292
micro-F1@1%: 0.379791
micro-F1@2%: 0.410207
micro-F1@3%: 0.422721
micro-F1@4%: 0.433862
micro-F1@5%: 0.441079
micro-F1@6%: 0.448772
micro-F1@7%: 0.451162
micro-F1@8%: 0.457318
micro-F1@9%: 0.45942
micro-F1@10%: 0.462941
Those results are similar but differ from the results in Table 4 by a few percentage points. Is the command above correct? Or is this kind of variation expected?
Thanks!
Hi. It's expected.
The original paper is evaluated by liblinear while here it is evaluated by pytorch. Liblinear is optimized by second-order methods and its stop criterion is different from our pytorch implementation.
OK great thanks. Are you able to give some more details about the experimental setup for those numbers? I have the following files for the youtube dataset:
~/.graphvite/dataset/youtube/
├── youtube_graph.txt
├── youtube-groupmemberships.txt
├── youtube-groupmemberships.txt.gz
├── youtube_label.txt
└── youtube-links.txt.gz
I'm guessing that Table 4 shows the results for predicting youtube_label.txt -- but that file only has 50767 entries for 31761 unique nodes, instead of |V| = 1,138,499 entries like I'd expect. Thoughts?
Thanks! ~ Ben
If I had to guess w/o digging through your code (yet :) )--
I'm guessing that you convert youtube_label.txt to a (num_labeled_examples, num_labels) binary matrix, then use k% of the rows of the matrix to train the classifier than (100 - k)% to validate the classifier. In that case, 1% Labeled Nodes would indicate you used 31761 * 0.01 = 317 nodes to train the classifier in the first column of Table 4.
Is that right? Or am I misunderstanding something?
Yes you're totally right. This setting is exactly inherited from DeepWalk and LINE.
youtube-groupmemberships.txt is the raw label file, which contains a huge number of communities. Since most communities are really small and noisy, only a few large communities are used. For Youtube, it's the top-47 communities according to the paper of DeepWalk. You can find the generation code here.
Personally I guess the original authors use such small evaluation data just because they are using liblinear on CPU.