xiaoweihu comments

Results 6 comments of


                                            xiaoweihu

vinvl captioning: od_labels and features size not match

Hi, The region features do not need to have strict 1-to-1 correspondence with the object labels. In fact, it is acceptable to use different confidence thresholds, or even different models,...

Unable to Reproduce the results for VinVL+VIVO on NoCaps

Hi, This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help. Best, Xiaowei

Clarification of results

The caption evaluation scores are calculated by the official COCO evaluation code, which is a submodule in this repo. You may check the evaluation code for details of CIDEr.

where is train.fea.penzhan2.lab.oid_X152_min10.yaml?

Hi, Please find the dataset for COCO caption at https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#datasets The train_yaml is "train.yaml". You don't need other YAML files for training.

About od model to generate coco_caption features. I cannot reproduce your feature results.

Hi, the od features are generated from the model you linked [vinvl_vg_x152c4.pth](https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth). If you need the features on COCO or nocaps, you can download the pre-extracted features and labels at...

object tags

yes. The text input use the same word embedding.