linkpred icon indicating copy to clipboard operation
linkpred copied to clipboard

How to do link prediction on a large graph without 'MemoryError'

Open dbvdb opened this issue 6 years ago • 3 comments

Hi, I used this library and what i wanna really do is to load a large graph and use simrank for link prediction, but i get the following error:

Traceback (most recent call last):
  File "pred.py", line 12, in <module>
    results = model.predict(c=0.4)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/base.py", line 64, in predict_and_postprocess
    scoresheet = func(*args, **kwargs)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/eigenvector.py", line 88, in predict
    sim = simrank(self.G, nodelist, c, num_iterations, weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 73, in simrank
    M = raw_google_matrix(G, nodelist=nodelist, weight=weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 87, in raw_google_matrix
    weight=weight)
  File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
    M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan
MemoryError

could you help me?

dbvdb avatar Apr 10 '19 09:04 dbvdb

What version of networkx is this? Please make sure you have version 2.1 or more recent. Can you see if that solves the problem? Current code in networkx seems more efficient.

What could be done in linkpred, is let raw_google_matrix return a scipy sparse matrix. That should trim down memory usage of this part of the code. I can imagine, however, that very large graphs will remain hard to process, mainly because linkpred uses networkx data structures for graphs, which are implemented in pure Python.

rafguns avatar Apr 16 '19 13:04 rafguns

I used networkx==2.3 and linkpred==0.4.1

dbvdb avatar Apr 18 '19 06:04 dbvdb

Are you sure about networkx? I am asking because this line in your stack trace

  File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
    M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan

does not occur in current versions of networkx (it changed in networkx/networkx@8de67c08).

But this point is a bit moot because it will probably not fundamentally change the issue of memory usage. See my previous comment for some thoughts on that.

rafguns avatar Apr 30 '19 19:04 rafguns