Extracting text for the single image

Open yurii-piets opened this issue 5 years ago • 1 comments

Sorry for duplication of the issue https://github.com/layumi/Image-Text-Embedding/issues/1, but can you, please, explain how can I extract text for the single image given as input? It is not clear for me what steps I need to do to get text description of the single image.

Also, I was wondering if I can extract text for some external image, I mean for the image that was not included in the train and val image set?

I will really appreciate any help.

Mar 22 '20 21:03 yurii-piets

Hi @yurii-piets The code is to map the image or text input to one shared space. Therefore, given one image, we could extract the image embedding (feature). Given one sentence, we could extract the corresponding text embedding (feature). More precisely, we extract the shared feature from image inputs, rather than text feature from image inputs.

Yes. You could extract the feature as well. But one thing that you should keep in mind is the data distribution of external images.

If the image is collected from Flickr, you should choose the model pretrained either on Flickr30k or MSCOCO. If the content image is pedestrian, you should choose the model pretrained on CUHK-PEDES.

The model works well when the testing distribution is close to the training distribution.

Mar 23 '20 02:03 layumi