haystack Verify compatibility between `Data2VecVision` models and existing retrievers

Context

Part of #2418
After the simplification of language_model.py and tokenization.py, adding new supported model types in Haystack has been heavily simplified
The entire framework is still oriented heavily towards question answering on text, and this assumption is embedded into the code in many parts of the stack

Goal

Verify if any existing retriever can load a image retrieval model such as Data2VecVision with minor changes along the way
- If it can, consider a small refactoring to make the code paths more generic (change get_tokenizer into get_feature_extractor and so on)
- If it cannot in its current state, even with minor adaptation, consider creating a separate ImageRetriever class that can do that. Also evaluate if the underlying stack (Inferencer, Processor, AdaptiveModel etc) can be leveraged or not, and to which degree.

Jul 21 '22 14:07 ZanSara

An attempt to generalize TableTextRetriever to work with images quickly proved too complex for the scope of this issue.

Rather than modifying an existing Retriever with the risk of breaking working code, I opted for cloning TableTextRetriever and its stack of supporting classes and perform the changes needed to support N models rather than just 3 (query, text and tables).

The goal of this issue then changes to the following:

Create a multi modal retriever called MultiModalRetriever by generalizing the concepts introduced by TableTextRetriever
It introduces a stack of new subclasses to support such retriever, such as:
- MultiAdaptiveModel (from TriAdaptiveModel)
- EmbeddingSimilarityHead (from TextSimilarityHead)
- MultiModalSimilarityProcessor (from TableTextSimilarityProcessor)

Note that this Retriever will NOT be tested for working in pipelines, but only to work in isolation. It will also, most likely, stay undocumented. See https://github.com/deepset-ai/haystack/issues/2418 for the rationale.

Jul 27 '22 10:07 ZanSara

Continues in #2857

Aug 30 '22 08:08 ZanSara