Indexing with vectors / vector search

Open joepio opened this issue 1 year ago • 0 comments

LLMs have enabled a new type of search: vector search. Instead of finding a word or string, vector search turns entries and queries into vectors (arrays of numbers).

RAG search would be very useful for Atomic Assistant.

UseCases

Finding things by their semantic meaning (improved search)
Searching through images, videos, audio, etc. using embeddings
Finding relevant resources
Injecting context in an LLM #951

Approaches

Using sled / KV store

I don't think it's possible to do meaningful nearest-neighbor search using sled's KV / BTreeMap data structure. Searching in a KV store is done with range queries over lexicographically sorted keys, and I can't see how we can turn a high-dimensional vector in a meaningful key, where close neighbors are also lexicographically close.

Using an external vector DB

The whole point of AtomicServer is that you don't need anything else. I don't want external dependencies, I want it to be a small single binary that you can just run and it gives you all you need.

OasysDB

An embeddable vector database in rust. That's the spirit!

LanceDB

Embeddable vector search DB, seems to be really fast, also has full-text search (using tantivy). It's not OLTP

Oct 30 '24 10:10 joepio