Can OFA perform fast retrieval task as dual-stream-like models?
Hi there:
Can OFA perform fast retrieval task as dual-stream-like models? Since I want to get embedding for the image or text, and use the saved embedding for fast retrieval by calculating cosine similarity. Instead of input every pair of text and image.
How you can give me any hint!
Thanks a lot!
@JacobKong Hi, I think you can use the hidden states of <BOS> on the encoder side, or the hidden states of <EOS> on the decoder side for retrieval. However, since the pre-training tasks of OFA do not contain contrastive loss,
it is necessary to fine-tune OFA using contrastive loss before applying it to the retrieval task.
Thanks a lot for the reply! So should I input single image or should I input both the image and a instruction text to the encoder to get the final image embedding?
@JacobKong I recommend input both the image and a instruction text.
@JacobKong And I think it's better to reuse the instruction " what does the image describe? "
Thanks! I'll try it.
What about text embedding? What's the best instruction to input for extracting text embedding?
@JacobKong I'm not sure... you may need to try it yourself.