upsert_vectors() shouldn't use element IDs?
It seems that using elementIds is not safe outside the scope of a single transaction as per the official Neo4j documentation.
upsert_vectors() expects an ids list which is then injected into a Cypher query depending on the entity_type targeted, for example here for nodes.
Requiring the ids to be the element IDs means they need to be retrieved in a different session than the one started within upsert_vectors() from the .execute_query() call here, which contradicts the official recommendation as those IDs can't be considered stable across sessions/transactions.
Suggested changes:
- support passing a user-created ID to
upsert_vectors()instead (e.g. a UUID property on nodes/relationships) - and/or support elementId matching in scope of the same session/transaction as underlying upsert query
Let me know if I'm missing something.
Hi @Herakleis ,
Thank you for raising the issue!
You're right that using element ID outside a transaction is not safe. This function was created mainly to help new (beginner) users to import a few vectors, so in scenario where conflicting element ID are very unlikely (it is very likely that no other transaction is happening in the db at the same time). We should make it clearer in the documentation though.
To make this function usable for a larger dataset and/or live databases, more optimization is required: the one you suggested but also performance wise. This is unfortunately not planned for now.
What is the correct method to use for larger datasets or embeddings with large dimensions? And would you use asynchronous execute_write for this?