"Static vector blobs" for fast, immutable, in-memory vector stores
The goal: offer an API for an in-memory vector store with very fast KNN queries, like Faiss's IndexFlat, hnswlib "brute force" index, without storing inside a SQLite database.
The current setup allows you to do KNN searches in one of two ways:
- Vectors are stored as regular columns in regular tables, so use
vec_distance_l2()manually withORDER BY/LIMIT - Vectors are stored in
vec0tables, where chunked vectors are compared under-the-hood
Both of these are "fast enough", but not as fast as having all the vectors in-memory in a contiguous block of memory. In both of the cases above, vectors are stored in the SQLite database, which means 1) possible disk I/O on queries, and 2) vectors are stored in database pages so not contiguous.
Possible API
We'd need a way for clients to "register" in-memory blobs, and to perform KNN-style queries on those blobs.
Option one: a single temp.vec_static_blobs virtual table
insert into temp.vec_static_blobs(key, data_pointer, data_element_type, dimensions, count, ids_pointer, id_element_type)
select 'foo', ?, 'float', 768, 1000000, ?, 'int32';
Kindof exhaustive. Maybe narrow to data and ids, and move the parameter handling to scalar functions?
insert into temp.vec_static_blobs(key, data, ids)
values ('foo', vec_from_pointer(?), vec_from_pointer(?));
And KNN queries?
select
rowid,
distance
from temp.vec_static_blobs('foo')
where vector match ?
order by distance
limit 20;