Oscar icon indicating copy to clipboard operation
Oscar copied to clipboard

vinvl captioning: od_labels and features size not match

Open ruotianluo opened this issue 4 years ago • 2 comments

I downloaded vinvl captioning data from https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md.

When I run captioning training: I found the size of features (https://github.com/microsoft/Oscar/blob/master/oscar/run_captioning.py#L138) and the size of label_info(https://github.com/microsoft/Oscar/blob/master/oscar/run_captioning.py#L124) don't match; In principle they should both be the size of number of detected objects right?

I tried with old oscar features and those match.

ruotianluo avatar Jun 02 '21 05:06 ruotianluo

Hi,

The region features do not need to have strict 1-to-1 correspondence with the object labels. In fact, it is acceptable to use different confidence thresholds, or even different models, to extract region features and labels. Btw, if you look at the maximum lengths of the image input and object label input, they are also different.

xiaoweihu avatar Jun 24 '21 22:06 xiaoweihu

Do you know which model is used to obtain the label_info of the COCO_Caption dataset?

ckmstydy avatar Aug 12 '21 06:08 ckmstydy