sqlite-vec icon indicating copy to clipboard operation
sqlite-vec copied to clipboard

"Static vector blobs" for fast, immutable, in-memory vector stores

Open asg017 opened this issue 1 year ago • 0 comments

The goal: offer an API for an in-memory vector store with very fast KNN queries, like Faiss's IndexFlat, hnswlib "brute force" index, without storing inside a SQLite database.

The current setup allows you to do KNN searches in one of two ways:

  • Vectors are stored as regular columns in regular tables, so use vec_distance_l2() manually with ORDER BY/LIMIT
  • Vectors are stored in vec0 tables, where chunked vectors are compared under-the-hood

Both of these are "fast enough", but not as fast as having all the vectors in-memory in a contiguous block of memory. In both of the cases above, vectors are stored in the SQLite database, which means 1) possible disk I/O on queries, and 2) vectors are stored in database pages so not contiguous.

Possible API

We'd need a way for clients to "register" in-memory blobs, and to perform KNN-style queries on those blobs.

Option one: a single temp.vec_static_blobs virtual table

insert into temp.vec_static_blobs(key, data_pointer, data_element_type, dimensions, count, ids_pointer, id_element_type)
  select 'foo', ?, 'float', 768, 1000000, ?, 'int32';

Kindof exhaustive. Maybe narrow to data and ids, and move the parameter handling to scalar functions?

insert into temp.vec_static_blobs(key, data, ids)
  values ('foo', vec_from_pointer(?), vec_from_pointer(?));

And KNN queries?

select
  rowid,
  distance
from temp.vec_static_blobs('foo')
where vector match ?
order by distance
limit 20;

asg017 avatar May 12 '24 06:05 asg017