semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Pinecone Client and Memory Store

Open Kevdome3000 opened this issue 2 years ago • 1 comments

Motivation and Context

  1. Why is this change required? SK currently does not have a Pinecone Connector
  2. What problem does it solve? Allows users to use Pinecone as a Memory Store
  3. What scenario does it contribute to? Memory

Description

An implementation of IMemoryStore for Pinecone Vector database.

Contribution Checklist

  • [x] The code builds clean without any errors or warnings
  • [x] The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
  • [x] The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with dotnet format
  • [x] All unit tests pass, and I have added new tests where possible
  • [x] I didn't break anyone :smile:

I still need to add unit tests and example usage but wanted to get the conversation started and garner feedback before finalizing

Kevdome3000 avatar Apr 24 '23 04:04 Kevdome3000

@Kevdome3000 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Kevdome3000 avatar Apr 24 '23 04:04 Kevdome3000

After playing with a few of the sample apps and getting a better understanding of how the IMemoryStore interface is being utilized, I think a refactor makes sense.

For those familiar with Pinecone, namespaces are really powerful and for those who aren't, this is from their documentation:

Pinecone allows you to partition the vectors in an index into namespaces. Queries and other operations are then limited to one namespace, so different requests can search different subsets of your index.

For example, you might want to define a namespace for indexing articles by content, and another for indexing articles by title. For a complete example, see: Semantic Text Search (Example).

Every index is made up of one or more namespaces. Every vector exists in exactly one namespace.

Imagine a namespace per Kernel or SemanticTextMemory that can be cleared after a chat session while keeping other semantic data that exists in a Pinecone Index in tact. That's the ideal implementation in my opinion.

This PR in it's current state doesn't allow for that due to the fact that a user can also have multiple Pinecone Indexes, so the "collection"/"collectionName" parameters used throughout the IMemoryStore/ ISemanticTextMemory interface becomes ambiguous.

The easiest solution is to agree that a user is restricted to a single index per PineconeMemoryStore/ PineconeClient instance so that the name of the index no longer needs to be passed as a parameter and the collectionName/collection can be synonymous with namespace.

I hope that makes sense.

If on board let me know. I can turn that around quickly

Kevdome3000 avatar Apr 27 '23 06:04 Kevdome3000

After playing with a few of the sample apps and getting a better understanding of how the IMemoryStore interface is being utilized, I think a refactor makes sense.

For those familiar with Pinecone, namespaces are really powerful and for those who aren't, this is from their documentation:

Pinecone allows you to partition the vectors in an index into namespaces. Queries and other operations are then limited to one namespace, so different requests can search different subsets of your index. For example, you might want to define a namespace for indexing articles by content, and another for indexing articles by title. For a complete example, see: Semantic Text Search (Example). Every index is made up of one or more namespaces. Every vector exists in exactly one namespace.

Imagine a namespace per Kernel or SemanticTextMemory that can be cleared after a chat session while keeping other semantic data that exists in a Pinecone Index in tact. That's the ideal implementation in my opinion.

This PR in it's current state doesn't allow for that due to the fact that a user can also have multiple Pinecone Indexes, so the "collection"/"collectionName" parameters used throughout the IMemoryStore/ ISemanticTextMemory interface becomes ambiguous.

The easiest solution is to agree that a user is restricted to a single index per PineconeMemoryStore/ PineconeClient instance so that the name of the index no longer needs to be passed as a parameter and the collectionName/collection can be synonymous with namespace.

I hope that makes sense.

If on board let me know. I can turn that around quickly

I'd like to address this if all issues are resolved

Kevdome3000 avatar Apr 29 '23 01:04 Kevdome3000

Hey @Kevdome3000, did you intentionally close this PR, or was that just GitHub being helpful?

If you need help with a merge or something, don't hesitate to reach out.

shawncal avatar May 02 '23 05:05 shawncal

Oh no! That was not intentional. How can we recover? I created a new PR, will tag you both @shawncal @dmytrostruk

Kevdome3000 avatar May 02 '23 16:05 Kevdome3000

Thanks @Kevdome3000 ! Let's switch to the new PR then. We are close to merging this to main.

dmytrostruk avatar May 02 '23 18:05 dmytrostruk