How to do link prediction on a large graph without 'MemoryError'
Hi,
I used this library and what i wanna really do is to load a large graph and use simrank for link prediction, but i get the following error:
Traceback (most recent call last):
File "pred.py", line 12, in <module>
results = model.predict(c=0.4)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/base.py", line 64, in predict_and_postprocess
scoresheet = func(*args, **kwargs)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/predictors/eigenvector.py", line 88, in predict
sim = simrank(self.G, nodelist, c, num_iterations, weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 73, in simrank
M = raw_google_matrix(G, nodelist=nodelist, weight=weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/linkpred/network/algorithms.py", line 87, in raw_google_matrix
weight=weight)
File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan
MemoryError
could you help me?
What version of networkx is this? Please make sure you have version 2.1 or more recent. Can you see if that solves the problem? Current code in networkx seems more efficient.
What could be done in linkpred, is let raw_google_matrix return a scipy sparse matrix. That should trim down memory usage of this part of the code. I can imagine, however, that very large graphs will remain hard to process, mainly because linkpred uses networkx data structures for graphs, which are implemented in pure Python.
I used networkx==2.3 and linkpred==0.4.1
Are you sure about networkx? I am asking because this line in your stack trace
File "/home/danial/Envs/graph/lib/python3.6/site-packages/networkx/convert_matrix.py", line 369, in to_numpy_matrix
M = np.zeros((nlen,nlen), dtype=dtype, order=order) + np.nan
does not occur in current versions of networkx (it changed in networkx/networkx@8de67c08).
But this point is a bit moot because it will probably not fundamentally change the issue of memory usage. See my previous comment for some thoughts on that.