node2vec icon indicating copy to clipboard operation
node2vec copied to clipboard

something wrong with random walk in node2vec_spark?

Open rainbow2301 opened this issue 7 years ago • 2 comments

val edge2attr = graph.triplets.map { edgeTriplet => (s"${edgeTriplet.srcId}${edgeTriplet.dstId}", edgeTriplet.attr) }.repartition(200).cache

(s"${prevNodeId}${currentNodeId}", (srcNodeId, pathBuffer)) }.join(edge2attr).map { case (edge, ((srcNodeId, pathBuffer), attr)) =>

in the code, join key is generated by s"${edgeTriplet.srcId}${edgeTriplet.dstId}", do we need a separator between the two elements?

rainbow2301 avatar May 25 '18 03:05 rainbow2301

Actually Yes. You should use s"${edgeTriplet.srcId}\t${edgeTriplet.dstId}" instead!

wl142857 avatar Mar 05 '19 10:03 wl142857

yes. if you dont add a separator, edge between node #1 and node #1111 will be same with edge between node #11 and node #111 which is '11111'. When using sepaeator,like \t,there will be 1\t1111 vs 11\t111.

And ,i think, thats why u got bad results when your data is very big. Becase the bigger data you use,the more chance you get Wrong edges

liliangjie91 avatar Dec 16 '19 07:12 liliangjie91