OFA icon indicating copy to clipboard operation
OFA copied to clipboard

Can OFA perform fast retrieval task as dual-stream-like models?

Open JacobKong opened this issue 3 years ago • 6 comments

Hi there:

Can OFA perform fast retrieval task as dual-stream-like models? Since I want to get embedding for the image or text, and use the saved embedding for fast retrieval by calculating cosine similarity. Instead of input every pair of text and image.

How you can give me any hint!

Thanks a lot!

JacobKong avatar Aug 03 '22 12:08 JacobKong

@JacobKong Hi, I think you can use the hidden states of <BOS> on the encoder side, or the hidden states of <EOS> on the decoder side for retrieval. However, since the pre-training tasks of OFA do not contain contrastive loss, it is necessary to fine-tune OFA using contrastive loss before applying it to the retrieval task.

logicwong avatar Aug 06 '22 04:08 logicwong

Thanks a lot for the reply! So should I input single image or should I input both the image and a instruction text to the encoder to get the final image embedding?

JacobKong avatar Aug 06 '22 09:08 JacobKong

@JacobKong I recommend input both the image and a instruction text.

logicwong avatar Aug 06 '22 15:08 logicwong

@JacobKong And I think it's better to reuse the instruction " what does the image describe? "

logicwong avatar Aug 06 '22 15:08 logicwong

Thanks! I'll try it.

What about text embedding? What's the best instruction to input for extracting text embedding?

JacobKong avatar Aug 06 '22 16:08 JacobKong

@JacobKong I'm not sure... you may need to try it yourself.

logicwong avatar Aug 07 '22 05:08 logicwong