Reproduce image captioning results
I downloaded the finetuned BLIP on MSCOCO and evaluated it on MSCOCO karpathy test set. The following results are far away from the numbers in the paper.
{"val_Bleu_1": 0.7880731765264753, "val_Bleu_2": 0.639069559164176, "val_Bleu_3": 0.5072596931307983, "val_Bleu_4": 0.4004978708030138, "val_METEOR": 0.30870782546912806, "val_ROUGE_L": 0.5995167829757403, "val_CIDEr": 1.3235634828926905, "val_SPICE": 0.23770878020265654, "test_Bleu_1": 0.7890154986254275, "test_Bleu_2": 0.6375740092772626, "test_Bleu_3": 0.5048209676350459, "test_Bleu_4": 0.3967920738886881, "test_METEOR": 0.3095324603809636, "test_ROUGE_L": 0.5997370737017899, "test_CIDEr": 1.3332255300867943, "test_SPICE": 0.23785686584752766}
What should I do if I want to reproduce the results?