RFC: Proposal to Update vecs Python Client to Include Latest pgvector Functionalities
Summary
This RFC proposes adding support for the latest pgvector features into the vecs Python client. These include new vector types (halfvec, sparsevec), enhanced indexing capabilities, and additional vector functions (binary_quantize, hamming_distance, etc.).
Rationale
Recent advancements in pgvector—such as new vector types, improved indexing, and new functions—are currently missing from the vecs client. Integrating these features will ensure feature parity, enabling efficient storage, diverse similarity metrics, and extended vector operations, which will support a broader range of use cases.
Design
Proposed Additions
-
Vector Types:
-
halfvec: Half precision vectors for reduced storage and faster operations. -
sparsevec: Sparse vectors that store only non-zero values to optimize memory usage.
-
-
Indexing Enhancements:
-
bitType Indexing: Add support for indexing vectors stored asbittype. - L1 Distance with HNSW: Add support for using L1 distance with HNSW indexing for similarity searches.
-
-
New Functions:
-
binary_quantize: Converts a vector into a binary form based on a threshold. -
hamming_distance: Calculates Hamming distance for binary vectors. -
jaccard_distance: Computes the Jaccard distance between vectors. -
l2_normalize: Normalizes vectors to unit length. -
subvector: Extracts a subvector from the main vector.
-
Examples
For instance:
Creating a halfvec vector:
from vecs import halfvec
vec = halfvec([1.0, 2.0, 3.0])
Opened a PR to support l1 distance
The refactor for halfvec support is more significant but we're interested in supporting that too
at this point I don't think we'll add support for sparsevec or bit. If the use cases for those vector types take off we'll revisit that decision
Having support for MaxSim would be also great
On Wed 9. Oct 2024 at 17:16, Oliver Rice @.***> wrote:
Opened a PR to support l1 distance
The refactor for halfvec support is more significant but we're interested in supporting that too
at this point I don't think we'll add support for sparsevec or bit. If the use cases for those vector types take off we'll revisit that decision
— Reply to this email directly, view it on GitHub https://github.com/supabase/vecs/issues/93#issuecomment-2402628936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZSDSRSMW6R6RO3N43CQX3Z2VCFJAVCNFSM6AAAAABPJGKOHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBSGYZDQOJTGY . You are receiving this because you authored the thread.Message ID: @.***>
MaxSim
could you provide a reference? I don't see any references to MaxSim in the pgvector docs
@olirice here is reference in PGVector
having support like here qdrant or like vespa would be nice, happy to help with implementation if you guide me
multi-vector queries would be a good stand-alone feature request if you'd like to open a new issue for it
this is the first I've seen of it so would be happy to leave it open for a few weeks and see what feedback looks like
Hello, I know this is an older thread but is there still any interest in supporting halfvec?
I'm not sure if it is seen as beyond the scope of this project, but in my mind as higher dimension embeddings become more common, this will continue to be a more common request in typical dsci workflows.