LinkedDataEventStreams icon indicating copy to clipboard operation
LinkedDataEventStreams copied to clipboard

Consistent graph replication - RDF Dataset Canonicalization

Open sandervd opened this issue 2 years ago • 7 comments

When a client requires hard guarantees on consistency, the logic described in the RDF Dataset Canonicalization could be used to provided hashes of the state that should be reached after applying a fragment, or even better, a transaction. This becomes relevant in cases where LDES is used as a replication protocol for named graphs (the client should have an exact copy of the named graph the publisher intended). For instance, consistency could be lost if a client is offline longer than allowed by the retention period, which could result in missed delete operations (tombstone events). If a checksum mismatch is detected, the client must restart replication from the start of the log to arrive at consistent state.

Reference: https://www.w3.org/TR/rdf-canon/

sandervd avatar Dec 20 '23 15:12 sandervd

I think this can be applied generically to TREE (tree client)?

xdxxxdx avatar Feb 08 '24 08:02 xdxxxdx

Hmm, I was thinking more to include a hash on each member (version object), that would represent the state of the full represented graph after applying the change: For instance if we would have a collection {(1,A,State 1), (2, B, Some value), (3, A, State 2)} After applying the 3th member, we would have the graph: {(A: State 2), (B: Some value)}. The hash should in this case be the hash of the state of the full graph, if that makes sense :smile: This way we can give much stronger guarantees of consistency.

Of course, the hashes would only be valid in tail of the log due to retention deleting objects that have newer state further in the log.

sandervd avatar Feb 09 '24 12:02 sandervd

I actually use that over here, to transform data dumps into an LDES feed: https://github.com/pietercolpaert/DCAT-AP-Dumps-To-Feeds/blob/main/index.ts#L59

I’m not sure however what would be the influence on the LDES spec itself? DO you expect this hash to be present in the member? Do you want a path to point to that property?

pietercolpaert avatar Feb 26 '24 10:02 pietercolpaert

Yes, I would see it as metadata of an event, similar like its timestamp. The hash would indicate the state of the graph after applying the member (or members in case of a transaction). This way we can assure graph integrity over time, the client can validate it holds an exact replica of the graph published/intended. I see this as an important guarantee in cases like the base registries etc.

sandervd avatar Mar 27 '24 17:03 sandervd