linkpred icon indicating copy to clipboard operation
linkpred copied to clipboard

Simplify evaluation to plain functions

Open rafguns opened this issue 10 years ago • 1 comments

sklearn.metrics is in a way much simpler, using plain fuctions. Can we do something analogous or even depend on scikit-learn for stuff like ROC, recall-precision etc.?

rafguns avatar Feb 13 '15 14:02 rafguns

Looking at this again. The main issue is that linkpred uses its own data structure (Scoresheet) to track prediction scores. This has at least two advantages:

  1. Order of nodes is never a problem: (a,b) == (b,a) and ranking of pairs with the same scores is deterministic
  2. Only node pairs for which there is a prediction need to be tracked, which is less memory-intensive. This is especially a concern for larger networks. E.g. 5000 nodes yield 12497500 node pairs.

Especially 2 is fundamentally different from scikit-learn.

The way forward is probably to replace Scoresheet with a Pandas Series, whose keys are all node pairs and whose values are scores. The index could be built prior to evaluation:

idx = pd.MultiIndex.from_tuples(itertools.combinations(G.nodes(), 2))

and shared across evaluations. The underlying numpy array could then be passed to scikit-learn metrics.

I am not yet sure how best to deal with 1 and/or to what extent it constitutes a problem.

rafguns avatar Nov 06 '19 15:11 rafguns