Vector embeddings on logs
Vector embeddings support
Vector should support an embedding-transform through VRL
The only thing we need would be a configuration of the embedding endpoints to use and the place to store the output
Why? Semantic search through data can be powered cheaply using vector embeddings, in order to be a step towards AI-powered monitoring we should support translating logs to vectors and adding it as a field.
Sample feature:
// we configure the embedding endpoint that accepts text and outputs a matrix in vector.toml
embedding_endpoint = https://api.openai.com/v1/embeddings
embedding_endpoints_api_key = "sk-..."
// sink, some data stores support vector search natively like pinecone, weaviate, etc
// perhaps we would need to support those sinks separately
// in vrl we just call it like so and it should pull api keys from vector.toml
.embedding = log_to_embdedding(.log)
Use Cases
as a user I can use natural language to search through my logs
the end result will allow users to have intelligent search through logs with natural language.
as a developer I can implement semantic search quickly with vector.dev
a better developer experience.
for example:
a search query like: "give me the failures from amazon in the last three hours"
can output the most relevant logs
Attempted Solutions
No response
Proposal
I propose an investigation between the referenced services and see if this is a quick implementation or if it is not worth the investment / already supported but with a different customization.
References
This is an example endpoint that generates matrices based on text input, there are other one's but openAI is the most prevalent solution at the moment
- https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
This is a service specialized in vector search
- https://weaviate.io/
Pinecone is also a popular vector store
- https://www.pinecone.io/learn/vector-database/
User's can implement vector search using just a SQL database as well
Version
No response
Related project: https://github.com/Anush008/fastembed-rs
Value prop:
-
Users will want to send logs closer from the edge to catch errors quickly eg, if something is a security flaw worthy or not
-
Vector embeddings can be supported through VRL and on edge using ONXX so its possible to demo this within vrlplayground as well
-
I imagine someone's VRL function to handle an event, if the event is a certain error to add label "error": "security flaw from hot path library" which will have an accompanying "error_embedding": [vector[78]]
and then downstream someone's UI or search can be like
"do we have any errors recently in the hotpath?"