What is the implementation logic of MATCH in sql
SELECT rowid, distance FROM vec_examples WHERE sample_embedding MATCH '[0.890, 0.544, 0.825, 0.961, 0.358, 0.0196, 0.521, 0.175]' order by distance limit 2;
Hello, What does the SQL statement MATCH do in the above example. Where does distance come from, Does it calculate the Euclidean distance?
SELECT rowid, distance FROM vec_examples WHERE sample_embedding MATCH '[0.890, 0.544, 0.825, 0.961, 0.358, 0.0196, 0.521, 0.175]' order by distance limit 2;
Hello, What does the SQL statement MATCH do in the above example. Where does distance come from, Does it calculate the Euclidean distance?
I asked @asg017 over discord and here is the answer:
by default it's L2 distance, can be configured to something else with distance_metric:
create virtual table vec_items using vec0( contents_embedding float[768] distance_metric=cosine )
Speaking as an SQL developer, while I'm delighted with this extension's concept, I don't like the syntax. It is confounding: a WHERE clause in SQL is meant to filter rows that satisfy a predicate, but here - if I understand correctly - it doesn't filter rows at all and sample_embedding MATCH '[0.890, 0.544, 0.825, 0.961, 0.358, 0.0196, 0.521, 0.175]' is not actually a predicate (a boolean expression). So, instead of filtering rows, WHERE in this particular case adds a calculated column with opaque semantics (cosine, L2, or perhaps something else, you'd need to consult the table definition beforehand).
Okay, I'm guessing this WHERE+LIMIT thing is supposed to do behind-the-scenes indexing, so this unfortunate syntax is perhaps a necessity given the SQLite's virtual table framework.