Support for the sparse embeddings
The latest pgvector version supports sparsevec.
However, langchain's PGVector supports only one embeddings column in langchain_pg_embedding table.
It would be great to have a sparse_embedding column and sparse_embedding field in PGVector.
I have considered the alternative and that is to have 2 PGVector stores, 1 for dense and 1 for sparse vectors. However there are 2 problems with that:
- PGVector has hardcoded table names for collection and embeddings
- I would like to leverage excellent langchain indexer with SQL manager.
hi @magaton I would be interested in collaborating on this, I would also like some kind of full-text/dense feature https://github.com/langchain-ai/langchain-postgres/issues/61
Hello, would be interested also.
But I think each vector DB should be separated. So for a hybrid search it would be
- One Dense embedding vector DB (using the current feature)
- One Sparse Vector DB (using https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search/cross_encoder.py)
And then rerank by using EnsembleRetriever (for example: https://python.langchain.com/docs/how_to/ensemble_retriever/ )
To achieve this we should also bump the pgvector python version: #82
hi, I could really do with this feature. I have made a very crude PR that suggests how this might be done, I would appreciate some help as I do not know this codebase well :) https://github.com/langchain-ai/langchain-postgres/pull/204