Indexing with vectors / vector search
LLMs have enabled a new type of search: vector search. Instead of finding a word or string, vector search turns entries and queries into vectors (arrays of numbers).
RAG search would be very useful for Atomic Assistant.
UseCases
- Finding things by their semantic meaning (improved search)
- Searching through images, videos, audio, etc. using embeddings
- Finding relevant resources
- Injecting context in an LLM #951
Approaches
Using sled / KV store
I don't think it's possible to do meaningful nearest-neighbor search using sled's KV / BTreeMap data structure. Searching in a KV store is done with range queries over lexicographically sorted keys, and I can't see how we can turn a high-dimensional vector in a meaningful key, where close neighbors are also lexicographically close.
Using an external vector DB
The whole point of AtomicServer is that you don't need anything else. I don't want external dependencies, I want it to be a small single binary that you can just run and it gives you all you need.
OasysDB
An embeddable vector database in rust. That's the spirit!
LanceDB
Embeddable vector search DB, seems to be really fast, also has full-text search (using tantivy). It's not OLTP