Jennifer-6

Results 8 issues of Jennifer-6

Is word embedding of object tags the same embedding matrix as word embedding of caption

can't find datasets/coco_caption/test.yaml

what is the input of nocaps inference? the image or the image feature throngthout vinvl?

where is the nocaps task trained model?

How long does it take to train the network?

how to visualize the Saliency Maps?

![Snipaste_2021-06-01_21-35-54](https://user-images.githubusercontent.com/61728321/120332525-69b8fc00-c321-11eb-8c2d-8d981495b102.png)