haystack Support Sparse Embedding Retrieval

It is a feature the community asks for and is currently supported by Qdrant and Pinecone.

Update I experimented with the complete round trip: from Document to sparse embedding stored in Qdrant/Pinecone and then querying (notebook).

What we need to do:

- [x] Investigate/design the integration
- [x] Introduce SparseEmbedding class and add it to Document
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/604
- [x] release the SparseEmbedding class in 2.0.1
- [x] Introduce a first Sparse Embedder (https://github.com/deepset-ai/haystack-core-integrations/pull/579)
- [x] Make Qdrant write sparse embeddings (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] Introduce Qdrant Sparse Embedding Retriever (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] non-urgent: understand the problems related to Qdrant Hybrid Retriever
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/695
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/660
- [ ] https://github.com/deepset-ai/haystack-core-integrations/pull/675
- [ ] The feature was announced through social media

Mar 13 '24 16:03 anakin87

Note: As a 1st step: we now have working Sparse embedder in Haystack through FastEmbed integration https://github.com/deepset-ai/haystack-core-integrations/pull/579

Mar 13 '24 20:03 lambda-science

Btw it would be cool to have a general BM25 Embedder in core haystack repo instead of relying only on Splade Embedder from FastEmbed :) As you already have a haystack-bm25

Mar 13 '24 23:03 lambda-science

Note: As a 2nd step: Qdrant integration could now support Sparse vector and be compatible with the FastEmbed sparse embedder from above 👀 https://github.com/deepset-ai/haystack-core-integrations/pull/578

Mar 14 '24 10:03 lambda-science