Bug: dotnet Issue with MemoryRecord.ReferenceRecord Method and Clarification on SaveReferenceAsync vs SaveInformationAsync
Dear Semantic Kernel Maintainers,
I hope this message finds you well. I’ve been exploring your project and have an issue I’ve encountered.
While working with the “09-memory-with-chroma.ipynb” notebook, I attempted to use the Qdrant database instead of Chroma and noticed a problem (probably not because of the Qdrant). When saving references to Qdrant using the SaveReferenceAsync method, the text field is consistently set to an empty string, resulting in an empty text entry in the Qdrant database.
Upon reviewing the code, I found that the MemoryRecord.ReferenceRecord method is explicitly setting the text field to string.Empty. Here is the relevant part of the code:
public static MemoryRecord ReferenceRecord(
// ... parameters ...
string? description,
ReadOnlyMemory<float> embedding,
// ... other parameters ...
)
{
return new MemoryRecord(
new MemoryRecordMetadata
(
// ... other metadata fields ...
description: description ?? string.Empty,
text: string.Empty, // This line sets the text field to empty
// ... other metadata fields ...
),
embedding,
// ... other arguments ...
);
}
To address this issue, I suggest updating the method to accept a text parameter and use it within MemoryRecordMetadata, like so:
public static MemoryRecord ReferenceRecord(
// ... parameters ...
string text, // Add this parameter
ReadOnlyMemory<float> embedding,
string? description = null,
// ... other parameters ...
)
{
return new MemoryRecord(
new MemoryRecordMetadata
(
// ... other metadata fields ...
description: description ?? string.Empty,
text: text ?? string.Empty, // Use the text parameter here
// ... other metadata fields ...
),
embedding,
// ... other arguments ...
);
}
Could you please consider incorporating this change into your project or provide further guidance on this matter?
Lastly, I would appreciate some clarification on the distinct roles of SaveReferenceAsync and SaveInformationAsync methods within your framework. It seems that they are designed for different types of data storage, but I would like to understand the specific use cases for each.
Thank you for your time and assistance.
Hi @vasemax, thanks for trying out Semantic Kernel and thanks for the feedback!
Let me clarify the use cases for the two methods you mentioned, since that should help clarify why the text field is set to empty.
- SaveReferenceAsync: This stores just a reference to the source text and an embedding of the text, but not the text itself, which is useful for cases where you may have fast and cheap access to the original text in another store, can look it up using the reference, and don't want to store a copy in the vector store too. All you want at retrieval time is the reference, so that you can retrieve the data from the source and pass it to the LLM.
- SaveInformationAsync: This stores the text as well as the embedding in the store, which is useful for cases where storage space in the vector store isn't a concern or retrieving the text from source at runtime may be expensive or impossible. So at query time, you want the text as well.
Both require the text as input to generate the embedding, but only SaveInformation actually persists the text.
Thank you very much for your time explaining the issue!