node2vec
node2vec copied to clipboard
[node2vec_spark]Could somebody explain what this code is doing?
In GraphOps.scala:
def setupAlias(neighbours: Array[(Long, Double)]): (Array[Int], Array[Double]) = {
val K = neighbours.length
val J = Array.fill(K)(0)
val q = Array.fill(K)(0.0)
val smaller = new ArrayBuffer[Int]()
val larger = new ArrayBuffer[Int]()
val sum = neighbours.map(_._2).sum
neighbours.zipWithIndex.foreach { case ((nodeId, weight), i) =>
q(i) = K * weight / sum
if (q(i) < 1.0) {
smaller.append(i)
} else {
larger.append(i)
}
}
while (smaller.nonEmpty && larger.nonEmpty) {
val small = smaller.remove(smaller.length - 1)
val large = larger.remove(larger.length - 1)
J(small) = large
q(large) = q(large) + q(small) - 1.0
if (q(large) < 1.0) smaller.append(large)
else larger.append(large)
}
(J, q)
}
What does this function do? I know that the J, q is used for transition probability, but what is J and q exactly and how they are related with transition probability?
Let me answer my own question. This is a sample method called "Alias Method", which has a O(1) time complexity for sample and O(n) time complexity for construct the alias table (a data structure needed for this sample method). More details: https://www.keithschwarz.com/darts-dice-coins/