azure-search-vector-samples icon indicating copy to clipboard operation
azure-search-vector-samples copied to clipboard

Feature Request: Collection of vector fields

Open kirk-marple opened this issue 2 years ago • 5 comments

For our use case, we are ingesting long documents and audio transcripts. The amount of text we're starting with exceeds the 8K limit of the Ada embedding model.

So we need to create multiple embeddings from each piece of content.

Since we can only store one vector per search document, I had to come up with a hacky solution to store 'n' search documents per content. (Basically one parent search document, and 'n' child search documents, n == # of chunks).

If the Cog Search index could support a collection of complex types, each which included a vector, it would make this scenario much cleaner for these use cases.

Currently, it errors with "Only a top-level field of the index can be a vector field."

kirk-marple avatar May 24 '23 06:05 kirk-marple

Thanks for the feedback, we are actively working on this feature but have no ETA at this time.

farzad528 avatar May 24 '23 19:05 farzad528

Excellent. We are also interested in the same.

smharvey avatar May 24 '23 23:05 smharvey

This seems like a really key problem to solve - how many real world documents are small enough that you would be able to include them directly in a prompt? Carving up a document (on ingestion or indexing) such that it is possible to find and retrieve just the relevant portions in a prompt seems like a blocking requirement. Wonder how others are solving this problem with Cognitive search?

Related to this, found these two fields in an index created by the Custom Answering service. My plan was to use embeddings and build a similar QnA service as suggested in a lot of Microsoft slides, but seeing those, I'm wondering if the pattern is being implemented on the Custom Answering service.

image

davidjrh avatar May 25 '23 12:05 davidjrh