why it is so slow to compute the meteor score?
Hi, @salaniz. I compute meteor and rouge scores but I find it is rather slow to wait for the computation result of Meteor scores. Could you please tell me why? Thanks!
Here is the code for reproduction, if it helps.
from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge
def evaluate_coco(ref_data, hyp_data):
scorer_meteor = Meteor()
scorer_rouge = Rouge()
ref_data = [[ref_datum] for ref_datum in ref_data]
hyp_data = [[hyp_datum] for hyp_datum in hyp_data]
ref = dict(zip(range(len(ref_data)), ref_data))
hyp = dict(zip(range(len(hyp_data)), hyp_data))
print("coco meteor score ...")
coco_meteor_score = scorer_meteor.compute_score(ref, hyp)[0]
print("coco rouge score ...")
coco_rouge_score = float(scorer_rouge.compute_score(ref, hyp)[0])
return coco_meteor_score, coco_rouge_score
def main():
ref_data = ['there is a cat on the mat']
hyp_data = ['the cat is on the mat']
evaluate_coco(ref_data, hyp_data)
Hi, I also find the same issue. The meteor computation in my case is taking hours. Did you find the reason? When I tried to kill the code, I get the following stack trace.
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Input In [10], in <cell line: 22>()
20 final_scores[method] = score
21 return final_scores
---> 22 calc_scores(corpus, references)
Input In [10], in calc_scores(ref, hypo)
13 final_scores = {}
14 for scorer, method in scorers:
---> 15 score, scores = scorer.compute_score(ref, hypo)
16 if type(score) == list:
17 for m, s in zip(method, score):
File ~/anaconda3/envs/gpt/lib/python3.9/site-packages/pycocoevalcap/meteor/meteor.py:40, in Meteor.compute_score(self, gts, res)
37 stat = self._stat(res[i][0], gts[i])
38 eval_line += ' ||| {}'.format(stat)
---> 40 self.meteor_p.stdin.write('{}\n'.format(eval_line).encode())
41 self.meteor_p.stdin.flush()
42 for i in range(0,len(imgIds)):
KeyboardInterrupt:
Calculating the meteor score does take some time, but it shouldn't take hours. Can you try to run the example/coco_eval_example.py script and report your runtime?
Of course, it scales with larger datasets, but even on the whole COCO validation set, evaluating all metrics should not take more than a couple of minutes.
EDIT: @jianguda timing your code, it took around 8.6 seconds on my machine.
The problem from NLTK or its sub-components may make METEOR stuck and take hours to process.
Maybe your computer can not download one of these packages:
[nltk_data] Downloading package wordnet to [nltk_data] C:\Users\xx\AppData\Roaming\nltk_data... [nltk_data] Package wordnet is already up-to-date! [nltk_data] Downloading package punkt to [nltk_data] C:\Users\xx\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package omw-1.4 to [nltk_data] C:\Users\xx\AppData\Roaming\nltk_data... [nltk_data] Package omw-1.4 is already up-to-date!
or even word_tokenizer of NLTK makes the process slow.