xiaoweihu

Results 6 comments of xiaoweihu

Hi, The region features do not need to have strict 1-to-1 correspondence with the object labels. In fact, it is acceptable to use different confidence thresholds, or even different models,...

Hi, This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help. Best, Xiaowei

The caption evaluation scores are calculated by the official COCO evaluation code, which is a submodule in this repo. You may check the evaluation code for details of CIDEr.

Hi, Please find the dataset for COCO caption at https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#datasets The train_yaml is "train.yaml". You don't need other YAML files for training.

Hi, the od features are generated from the model you linked [vinvl_vg_x152c4.pth](https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth). If you need the features on COCO or nocaps, you can download the pre-extracted features and labels at...

yes. The text input use the same word embedding.