vector Vector embeddings on logs

Vector embeddings support

Vector should support an embedding-transform through VRL

The only thing we need would be a configuration of the embedding endpoints to use and the place to store the output

Why? Semantic search through data can be powered cheaply using vector embeddings, in order to be a step towards AI-powered monitoring we should support translating logs to vectors and adding it as a field.

Sample feature:

// we configure the embedding endpoint that accepts text and outputs a matrix in vector.toml
embedding_endpoint = https://api.openai.com/v1/embeddings
embedding_endpoints_api_key = "sk-..."


// sink, some data stores support vector search natively like pinecone, weaviate, etc
// perhaps we would need to support those sinks separately

// in vrl we just call it like so and it should pull api keys from vector.toml
.embedding = log_to_embdedding(.log)

Use Cases

as a user I can use natural language to search through my logs

the end result will allow users to have intelligent search through logs with natural language.

as a developer I can implement semantic search quickly with vector.dev

a better developer experience.

for example:

a search query like: "give me the failures from amazon in the last three hours" can output the most relevant logs

Attempted Solutions

No response

Proposal

I propose an investigation between the referenced services and see if this is a quick implementation or if it is not worth the investment / already supported but with a different customization.

References

This is an example endpoint that generates matrices based on text input, there are other one's but openAI is the most prevalent solution at the moment

https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

This is a service specialized in vector search

https://weaviate.io/

Pinecone is also a popular vector store

https://www.pinecone.io/learn/vector-database/

User's can implement vector search using just a SQL database as well

Version

No response

Oct 08 '23 18:10 jonathanpv

Related project: https://github.com/Anush008/fastembed-rs

Mar 24 '24 16:03 jonathanpv

Value prop:

Users will want to send logs closer from the edge to catch errors quickly eg, if something is a security flaw worthy or not
Vector embeddings can be supported through VRL and on edge using ONXX so its possible to demo this within vrlplayground as well
I imagine someone's VRL function to handle an event, if the event is a certain error to add label "error": "security flaw from hot path library" which will have an accompanying "error_embedding": [vector[78]]

and then downstream someone's UI or search can be like

"do we have any errors recently in the hotpath?"

Apr 13 '24 05:04 jonathanpv