linkpred icon indicating copy to clipboard operation
linkpred copied to clipboard

EvaluationSheet : fully understand the universe terme

Open TaousDev opened this issue 5 years ago • 1 comments

I don't think I understand the "universe" term that is used as params, or how do I choose it in linkpred/evaluation/static/StaticEvaluation() also in EvaluationSheet() , you stated that this param is important to return the accuracy

Also, how do i get the confusion matrix, recall, precision and accuracy?

Concerning the accuracy do I pick the max value, like this : evaluation.accuracy().max() or is this wrong or should i do this : acc = (sum(evaluation.tp + evaluation.tn))/(sum(evaluation.tp + evaluation.tn + evaluation.fp + evaluation.fn)) (also i imported 'division from future')

I want to use sklearn but what's confusiing me is how do I retrieve the y_true and y_pred from a graph sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None) how do I get these data from the graph to use them in other Machine learning algorithms such as SVM

this is my full code :


`import linkpred
import random
from matplotlib import pyplot as plt

random.seed(100)

# Read network
G = linkpred.read_network('BUP_full.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 33))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test_set, simrank_results)

plt.plot(evaluation.recall(), evaluation.precision())`

Thank you

TaousDev avatar Sep 12 '20 21:09 TaousDev

The universe parameter is an iterable (typically a list or set) of all possible links (i.e. all node pairs) in the graph. Because the number of node pairs increases exponentially with the number of nodes, it can also simply be the number of node pairs (an int). So in your example, I think you could use

n = len(training)
universe = n * (n - 1) // 2

With the benefit of hindsight, this was a premature optimization that would probably require some fairly substantial work to get rid of. I'll get back to your other questions soon.

rafguns avatar Sep 24 '20 15:09 rafguns