langchain-postgres icon indicating copy to clipboard operation
langchain-postgres copied to clipboard

Support for the sparse embeddings

Open magaton opened this issue 1 year ago • 3 comments

The latest pgvector version supports sparsevec. However, langchain's PGVector supports only one embeddings column in langchain_pg_embedding table. It would be great to have a sparse_embedding column and sparse_embedding field in PGVector.

I have considered the alternative and that is to have 2 PGVector stores, 1 for dense and 1 for sparse vectors. However there are 2 problems with that:

  • PGVector has hardcoded table names for collection and embeddings
  • I would like to leverage excellent langchain indexer with SQL manager.

magaton avatar Jun 11 '24 10:06 magaton

hi @magaton I would be interested in collaborating on this, I would also like some kind of full-text/dense feature https://github.com/langchain-ai/langchain-postgres/issues/61

gecBurton avatar Jun 19 '24 08:06 gecBurton

Hello, would be interested also.

But I think each vector DB should be separated. So for a hybrid search it would be

  • One Dense embedding vector DB (using the current feature)
  • One Sparse Vector DB (using https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search/cross_encoder.py)

And then rerank by using EnsembleRetriever (for example: https://python.langchain.com/docs/how_to/ensemble_retriever/ )

To achieve this we should also bump the pgvector python version: #82

Freezaa9 avatar Oct 23 '24 14:10 Freezaa9

hi, I could really do with this feature. I have made a very crude PR that suggests how this might be done, I would appreciate some help as I do not know this codebase well :) https://github.com/langchain-ai/langchain-postgres/pull/204

gecBurton avatar Apr 27 '25 16:04 gecBurton