OFA Can OFA perform fast retrieval task as dual-stream-like models?

Hi there:

Can OFA perform fast retrieval task as dual-stream-like models? Since I want to get embedding for the image or text, and use the saved embedding for fast retrieval by calculating cosine similarity. Instead of input every pair of text and image.

How you can give me any hint!

Thanks a lot!

Aug 03 '22 12:08 JacobKong

@JacobKong Hi, I think you can use the hidden states of <BOS> on the encoder side, or the hidden states of <EOS> on the decoder side for retrieval. However, since the pre-training tasks of OFA do not contain contrastive loss, it is necessary to fine-tune OFA using contrastive loss before applying it to the retrieval task.

Aug 06 '22 04:08 logicwong

Thanks a lot for the reply! So should I input single image or should I input both the image and a instruction text to the encoder to get the final image embedding?

Aug 06 '22 09:08 JacobKong

@JacobKong I recommend input both the image and a instruction text.

Aug 06 '22 15:08 logicwong

@JacobKong And I think it's better to reuse the instruction " what does the image describe? "

Aug 06 '22 15:08 logicwong

Thanks! I'll try it.

What about text embedding? What's the best instruction to input for extracting text embedding?

Aug 06 '22 16:08 JacobKong

@JacobKong I'm not sure... you may need to try it yourself.

Aug 07 '22 05:08 logicwong