Should a `ldes:timestampPath` always be present and the members be in order?
Since a Linked Data Event Stream (LDES) is a stream of events in linked data, and events occur at specific points in time, each member (event) should have a timestamp.
I propose making this explicit in the specification by requiring that:
-
An LDES MUST have an
ldes:timestampPathproperty set. - Every member in the LDES MUST have a timestamp value using this property.
Depending on the outcome of #61, if a member received from the backend system lacks a timestamp, the ingestion process could attach one using the timestamp of ingestion. This approach is already suggested at the end of the second-to-last note in section 2 of the current specification, but I propose formalizing it instead of leaving it as a note.
Additionally, I want to open the discussion on requiring that members MUST be added to the LDES in ascending order. Since an LDES is a stream of events, ordering should be expected, but this is not explicitly stated in the specification. Enforcing this would ensure that after a client replicates the entire LDES, any new members found during synchronization are guaranteed to be newer.
These requirements will facilitate a more efficient implementation of smarter state management using timestampPath in the ldes-client reference implementation https://github.com/rdf-connect/ldes-client/issues/19.
- The first requirement ensures that state management can be applied to any LDES.
- The second requirement aligns with an existing assumption in the proposal, but since the current specification does not enforce ordering, this needs to be explicitly stated.
Related to #35
I would like to note that yes if the ingestion timestamp is enforced we could do some simpler state management, but only after fully replicating the LDES [1]. When that is done, we can 'forget' all members we've seen and just remember a timestamp that limits the lower bound of members. But this is not the case when the timestamp points to anything else then ingestion time as these might be anything.
[1]: if a LDES doesn't have a good timebased fragmentation, the client needs to know it has seen every member as some far away relation might still have a smaller member. (this currently happens when emitting ordered)
A fix has been proposed for this in #71. Please review the new text on this issue and propose changes if necessary.
So in short:
ldes:timestampPath becomes a SHOULD, that offers a lot of benefits for the client, although the client MUST still be able to do synchronization without it.
In the PR, the ldes:timestampPath decides the order of the LDES. Thus, the consequence
- for a server is that it MUST provide the members in order
- for a client is that it can count on the fact no members will be published after the last one, which gives benefits in state management.
This means however that, when you produce and LDES in a way that you cannot guarantee the order, you can also not set an ldes:timestampPath, yet you can still describe your collection as an ldes:EventStream: the members remain immutable. It could thus still be possible to not have an ldes:timestampPath and this means a client MUST keep supporting LDESs without those.
Without an ldes:timestampPath however, I do not think you will be able to get any benefit out of interpreting the tree:Relations as all relation will need to be followed. Nevertheless, the client can still keep in its state which tree:Nodes have been flagged as immutable, and not fetch those again.
The problems in the setting without an ldes:timestampPath is:
- For a client: the state can get really big and consume an unreasonable amount of memory
-
For a server: setting a retention policy also becomes impossible as this relies on the order defined by this
ldes:timestampPath. In the server primer we should therefore strongly discourage publishing out-of-order LDESs. An LDES producer however can fix that by adding another timestamp to every member indicating when they arrived in theldes:EventSource.
We should talk about this still when writing the server primer!
Server primer being proposed in #98
An initial server-primer has been published as a result from what a client must be able to take into account, translating this into normative text. Feel free to still review the text and open issues if you’re not fully content with how we tackled it in the newly published spec. Thanks for opening this long-standing issue again and guiding the discussion forward to what is now in the spec!