If the length of the correct description is not equal to the predicted description length, the bleu evaluation cannot be performed.
assert len(references) == len(hypotheses)
# Calculate BLEU-4 scores
bleu4 = corpus_bleu(references, hypotheses)
Hey @hubin111
This situation you're describing is impossible! Because, for each image, we will always have a generated caption (hypothesis). This is a simple list [hyp1, hyp2, ...]. Accordingly, we'll have references as a list of lists [[ref1a, ref1b, ref1c, ref1d, ref1e], [ref2a, ref2b, ref2c, ref2d, ref2e], ... ].
Now, if you compute the length of these two lists, then they will match.
This assert statement
assert len(references) == len(hypotheses)
just checks for whether we generated one caption for each image.
So, I don't see where the issue is coming from..
@kmario23 What you are suggesting makes complete sense. However, COCO dataset has 5 groundtruth captions per image. So since our model will generate one caption per image, how can their lengths be equal?
Python assertion checks for lengths only at the top level. See the example in my description above.
@kmario23 Yup, understood. Thanks :D