sqlite-vss icon indicating copy to clipboard operation
sqlite-vss copied to clipboard

Support WASM compilation

Open klavinski opened this issue 2 years ago • 11 comments

This extension would be great to enable vector search in the browser. Is there a guide to add it to the WASM build? I tried studying sqlite-lines, unsuccessfully.

klavinski avatar Mar 31 '23 12:03 klavinski

Would like to see it too, but will be hard. Any SQLite extension in WASM is complicated, as you saw in sqlite-lines. Additionally, since sqlite-vss relies on Faiss, I'd imagine there's even more hurdles we'd have to jump through, in addition to this being written in C++ (which probably isn't too big of an issue, but foriegn to me as I've only successfully compiled SQLite extensions to WASM with plain C extensions before).

Also, there's a bunch of different SQLite WASM targets now, each of which are slightly incompatible with each other. There's the official SQLite WASM build, sql.js, and probably a few more I don't know about.

I'm open for contributions that give it a shot, but if anyone reading this would like to give it a shot, please comment here with your approach before sending a PR. Additionally, if anyone wants to sponsor this work I'd be more than happy to talk about it, if you have a clear goal in mind!

asg017 avatar Apr 10 '23 22:04 asg017

The official WASM build makes it easier to implement an extension. In my case, I settled on adding this one. I copied the code of the extension into the file ext/wasm/extra_init.c, then followed the official steps:

./configure --enable-all
make sqlite3.c
cd ext/wasm
make

This produces the .wasm and .js files with the extension enabled.

klavinski avatar Apr 11 '23 12:04 klavinski

An option would be to have a version that does not depend on Faiss (separate branch?)

The HNSW algo is relatively simple, and there are some libraries like hnswlib

kroggen avatar Apr 11 '23 14:04 kroggen

Some have successfully compiled such algorithms to WASM.

klavinski avatar Apr 11 '23 19:04 klavinski

I copied the code of the extension into the file ext/wasm/extra_init.c, then followed the official steps:

Thanks for pointing out ext/wasm/extra_init.c! Seems like building for SQLite's WASM build is much easier than sql.js, at least since the last time I tried.

It's still be difficult for sqlite-vss however, since Faiss is such a heavy and tricky-to-compile dependency. I haven't found any examples of Faiss being compiled to WASM. But @kroggen that hnswlib library may be a solution: I originally looked at that lib when building sqlite-vss, but chose Faiss since it had way more indexing options and flexible storage.

I don't think adding hnswlib to sqlite-vss would be easy to do, and I'd rather sqlite-vss stay with Faiss for now. However, I can totally see a new sqlite-hnsw project that uses hnswlib instead, and has a similar APIsqlite-vss but without a few bells and whistles. Plus, since it's header only, it'll probably be very easy to compile to WASM.

I don't have the capacity now to start a new sqlite-hnsw project, but if anyone reading this wants to give it a try, would be more than happy to help!

asg017 avatar Apr 11 '23 19:04 asg017

Some have successfully compiled such algorithms to WASM.

I also looked at hora when building sqlite-vss, which would've worked with sqlite-loadable-rs, but it seemed inactive and I couldn't find any nice APIs to serialize an index to a buffer. Also sqlite-loadable-rs is great for simple table functions and virtual tables, but isn't great at shadow tables yet, so it would've been a lot of work to implement. Also, building a SQLite extension in Rust and compiling it to WASM is incredibly difficult (maybe impossible?)

asg017 avatar Apr 11 '23 19:04 asg017

Just found https://github.com/jiggy-ai/hnsqlite/, but they don’t have a wasm build

jlarmstrongiv avatar Jul 20 '23 02:07 jlarmstrongiv

I did not update this issue, but for those still looking for a solution, I successfully used a combination of hnswlib, which stores the embeddings in IndexedDB, and SQLite for the rest.

klavinski avatar Jul 20 '23 05:07 klavinski

I did not update this issue, but for those still looking for a solution, I successfully used a combination of hnswlib, which stores the embeddings in IndexedDB, and SQLite for the rest.

Appreciate if you could share the solution. Any public URL?

Thanks.

limcheekin avatar Aug 24 '23 12:08 limcheekin

Using hnswlib-wasm is straightforward, except for tuning the parameters. This is the best explanation I have found.

Today, I have discovered another web vector database with persistent storage: Victor. It uses OPFS instead of IndexedDB.

klavinski avatar Sep 14 '23 08:09 klavinski

I found another one which look promising and timely, it just get 1.0.0 released few days ago, the SurrealDB.

It support the following features according to the docs:

After losing some hairs in the past 2 days :), I finally make the surrealdb.wasm works with indxdb with simple test of a vector function today.

limcheekin avatar Sep 20 '23 07:09 limcheekin