graphtools icon indicating copy to clipboard operation
graphtools copied to clipboard

Landmark graph size does not match n_landmark

Open stanleyjs opened this issue 4 years ago • 0 comments

Hi,

I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.

The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.

There's two ways to fix this bug.

  1. It's working as intended: n_landmarks is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.
  2. It's not working as intended: n_landmarks should be the exact size of the landmark graph. In this case, we need to either a) change MinibatchKMeans to an algorithm that assigns all clusters 100% of the time, or figure out which parameters of MinibatchKMeans ensures this.

stanleyjs avatar Jan 03 '22 17:01 stanleyjs