PR: SK VectorDB connector work - Merging Forked branch & PR from Fork to SK branch
Motivation and Context
To provide SK connector for Qdrant Vector database using memory architecture of kernel
This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases. The VectorDB Skill/Connectors has significant changes which are likely more than initial estimations.
Please help reviewers and future users, providing the following information:
-
Why is this change required? This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant.
-
What problem does it solve? Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel.
-
What scenario does it contribute to? Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings.
-
If it fixes an open issue, please link to the issue here. N/A
Description
This PR currently includes connection for the Qdrant VectorDB only. What is out of scope: This PR removes the initial Milvus VectorDB addition and generic VectorDB client interfaces used for consistency across various external vector databases. This concept will be provided in forthcoming design & PR.
Addition and Modification of custom SK Qdrant.Dotnet SDK
- Removal of VectorRecord and VectorMetaData replacing with VectorRecordData which inheirits from IEmbeddingwithMetadata, IDataStore, IEmbeddingIndex
- Adding FetchVectorsRequest and FetchVectorsResponse classes for calls the Points/Scroll API of Qdrant vector database to get ALL vectors without vectorid by collection name only. Initially not in original custom SDK
- Update of QdrantDB constructor for connection to add port so Qdrant API for both REST/gRPC (which have same calls per Qdrant) are supported for performance and binary data
- Adding Points internal class to support various Qdrant Points API calls
- Adding a default vectorsize if not passed in that is default as ADA model.
- Update of IVectorDbCollection.cs to support VectorRecordData class and DataEntry<VectorRecordData>
- Adding method to QdrantCollections for GetAllVectorsAsync method
- Update to QdrantCollections.cs to support updated IVectorDBCollections interface changes.
- Changing almost add signatures of Qdrant Methods to return DataEntry<VectorRecordData
> instead of VectorRecord - Changing of SearchVectorsResponse.cs
- Adding FetchAllCollectionNamesRequest and FetchAllCollectionNamesResponse classes for calls the List Collection API for Qdrant Vector database to get vectors without vectorid by collection name only. Initial not in SDK
Additons to Skills.Memory.VectorDB
- Adding new namespace: Skills.Memory.VectorDB
- Adding Qdrant VectorDB client for SK, QdrantVectorDB
- Adding ILongTermMemoryStore interface for VectorDB
- Creating/Adding Qdrant Memory Store class, QdrantMemoryStore.cs. Adding new method for connecting, retrieving collections and embedding from Qdrant DB in cloud.
These notes will help understanding how your code works. Thanks!
Note
- This does build but it has several warnings as the existing SDK code that was unchanged has separate Logger and other functionality that SK now provides that I am requesting review of possible removal. - Based upon comments in fork, several of these have been addressed.
- Question about the need or method for GetCollections to retrieve ALL collections in a external database which could be significant data request as external vector databases store embeddings: Would like to discuss possible established limit from SK on pull. Adding API from Qdrant REST API to handle Getting Collection names.
Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
- [ X] The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with
dotnet format - [ ] All unit tests pass, and I have added new tests where possible
- [X ] I didn't break anyone :smile:
Doing Diff check to ensure all updates/committed files were successfully merged from Fork.