Support Sparse Embedding Retrieval
It is a feature the community asks for and is currently supported by Qdrant and Pinecone.
Update I experimented with the complete round trip: from Document to sparse embedding stored in Qdrant/Pinecone and then querying (notebook).
What we need to do:
- [x] Investigate/design the integration
- [x] Introduce SparseEmbedding class and add it to Document
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/604
- [x] release the SparseEmbedding class in 2.0.1
- [x] Introduce a first Sparse Embedder (https://github.com/deepset-ai/haystack-core-integrations/pull/579)
- [x] Make Qdrant write sparse embeddings (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] Introduce Qdrant Sparse Embedding Retriever (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] non-urgent: understand the problems related to Qdrant Hybrid Retriever
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/695
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/660
- [ ] https://github.com/deepset-ai/haystack-core-integrations/pull/675
- [ ] The feature was announced through social media
Note: As a 1st step: we now have working Sparse embedder in Haystack through FastEmbed integration https://github.com/deepset-ai/haystack-core-integrations/pull/579
Btw it would be cool to have a general BM25 Embedder in core haystack repo instead of relying only on Splade Embedder from FastEmbed :) As you already have a haystack-bm25
Note: As a 2nd step: Qdrant integration could now support Sparse vector and be compatible with the FastEmbed sparse embedder from above 👀 https://github.com/deepset-ai/haystack-core-integrations/pull/578