EventFlow Lack of event segregation corrupts event store

Using IIdentity when persisting events can lead to overlapping identities between different identity types, causing events and aggregates to be corrupted.

For example, if LocationId has the value of abc, and the VoyageId also has the value of abc, then the LocationCreatedEvent and VoyageCreatedEvent will collide. Both events are valid and can be persisted, but when loading the events, an InvalidCastException will be thrown for both the location and the voyage.

The documentation does not state that an implementation of IIdentity must be globally unique (or pre-partitioned). I, personally, do not think that the IIdentity should be responsible for ensuring global uniqueness or partitioning. I suggest that responsibility is owned by the IEventPersistence who can better determine how to properly persist and retrieve events related to a specific aggregate and persistence strategy. However, the Identity<T> class (that is very well documented) currently has this logic, for what I assume are good reasons. As such, I think that particular logic should remain intact, but the IIdentity-interface should still be fully honored.

The problem is that the EventStoreBase does not currently provide any aggregate type information to the IEventPersistence. (Note that the IReadModelStore<T> does this, as well as the ISnapshotPersistence.) Instead, the EventStoreBase implicitly relies on the highly specific logic inside Identity<T> to provide a partitioned event record key based on the IIdentity-implementing class. The problem remains hidden as long as you use Identity<T>.

The challenge is that this problem affects all current event stores, which store data differently. Any fix to actually solve the problem would need to update the store implementations individually and migrate all existing event data.

Dec 29 '20 10:12 Reyhn3

The interface IIdentity to me intrinsically implies a globally unique identity due to the lack of typing/additional information within the interface. It quite literally is just a string value. If I were to try and use these values as keys to a dictionary I would naturally expect there to be a collision if an identity of abc existed and I tried to add another abc entry.

Consider the following identity implementation

public class StringIdentity : IIdentity
    {
        public StringIdentity(string value)
        {
           Value = value
        }

        public string Value {get;}
    }

If we take a look at your unit test example, I would argue that your FirstAggregateId are SecondAggregateId identities are equivalent to this StringIdentity class and thus it would be natural for a FirstAggregateId with a value of abc to collide with a SecondAggregateId also with a value of abc.

If each of those classes had extended Identity<T> they would end up generating ids like firstaggregate-3521dff2-a1c9-4436-971e-2f59a0dd732b and secondaggregate-3521dff2-a1c9-4436-971e-2f59a0dd732b and would throw an error on construction if the provided string did not abide by that format.

Dec 29 '20 23:12 WardenUnleashed

That aside, I couldn't quite deduce the exact reason you needed typing at the IEventPersistence level but something that could be of use would be to use pattern mathcing/reflection on the IIdentity instance passed into your implementation of IEventPersistence when commiting / loading events.

Dec 30 '20 00:12 WardenUnleashed

Yes, FirstAggregateId, SecondAggregateIdand StringIdentity are equivalent, when it comes to persistence. And that is exactly my point.

Conceptually, though, in DDD, an aggregate is something that can be identified, which makes the identity a vital part of an aggregate. However, even though aggregates may be globally unique in practice and within the domain, I have never heard that the actual identifying value is required to be globally unique.

Most IDs I work with are incremental integers. Maybe that's a left-over from old SQL-dictated systems, or just convenience, but when remodeling legacy systems to DDD, the domain's identity values are rarely changed. I don't think it is uncommon that e.g. LocationId and VoyageId both are ints starting at zero, and then you'd immediately encounter this problem.

I think it is reasonable to allow any value as an identifier, thus allowing overlapping values. And that works perfectly, both conceptually and programmatically. The problem arises when trying to store and retrieve the events - the persistence mechanism simply drops the scope on the assumption that global uniqueness has already been obtained (despite there being no programmatical constraint nor any documentation).

I disagree that a pattern match on IIdentity is a solution. Yes, it can be used to ensure validity of the value, for instance to assert that a social security number starts with the century 19xx and does not exceed the current year. But that is a part of the business logic of the domain model. It does not dictate that the very same values may not be used as identifiers for a completely different aggregate.

I argue that the (type of the) aggregate is the scope within which the values must be unique.

I also argue that it is the responsibility of IEventPersistence to persist data correctly, because it is the persistence model that decides to store all events in a single store - hence it should also be responsible for storing the scope.

Possible solutions:

Provide type information to IEventPersistence and let the storage mechanism handle identity scoping. This gives the developer full control of the identity value and what class/interface to use, and it doesn't enforce any scoping rules on the developer. It also gives the store the possibility to store events in separate tables etc. without having to parse the identity value. The downside is that this is only a partial solution, as each store needs to be updated and data migrated.
Remove the misleading IIdentity in favor of Identity<T> and LSP. Actually, the interface has no meaning on a domain level, but it is used as a general interface for other types as well. This might be the best choice (although I personally don't agree it is the correct solution). It makes it harder to test, but the persistence segregation becomes a moot point as values are already scoped by prefix. However, it is the most invasive change to EventFlow and it might break existing implementations relying on IIdentity but it will keep stores and data intact.
The third option is to just ignore the problem. Update the documentation to advice developers to always use Identity<T> and completely ignore the interface if there is any chance of overlapping identity values. My mistake was assuming the identity interface could be implemented any which way, but the IEventPersistence actually expects the data to be highly specific. The risk is that more developers may make the same assumption when seeing the interface.

Edit: Added the third option.

Dec 30 '20 07:12 Reyhn3

Note that I wrote that it should be the aggregate type that keeps events separate from each other.

I don't think it is correct that Identity<T> is using its own name as the scope, i.e. scoping identities by identity (not aggregate). This is a bit philosophical, but the same Id-class could potentially be used for different aggregates, which would then also collide.

Once again, the Identity<T> is only a simple DTO, and as a value object it should only apply business rules to the identity value. Events are bound to the identity of the aggregate and should be scoped by the aggregate, not the DTO.

To fix this, it might be as simple as change T from where T : Identity<T> to where T : IAggregateRoot<TIdentity> and use IAggregateRoot.Name instead of regex-parsing. But I have not yet investigated this, but I think it will be a major breaking change worthy of its own ticket.

Dec 30 '20 08:12 Reyhn3

Note that I wrote that it should be the aggregate type that keeps events separate from each other.

I'm inclined to disagree, in favor for the the Aggregate's identity instead.(which is how it currently works). Unfortunately your suggestion limits people's ability to roll their own identity systems. For example, if I wanted to make it an aggregate base that handled generating its own identity from a string, I could extend AggregateRoot and do just that!

I don't think it is correct that Identity<T> is using its own name as the scope, i.e. scoping identities by identity (not aggregate). This is a bit philosophical, but the same Id-class could potentially be used for different aggregates, which would then also collide.

They would only end up colliding if the same id was generated from the different Aggregate classes. However, this would violate the idea of uniqueness for aggregates. I've seen people implement a kind of GuidIdentity class and have all of their aggregates use that as their IIdentity class over Identity<T>. This solution wouldn't have any collision problems and I believe things would works just fine because that implementation always generate unique identities.

Once again, the Identity<T> is only a simple DTO, and as a value object it should only apply business rules to the identity value. Events are bound to the identity of the aggregate and should be scoped by the aggregate, not the DTO. To fix this, it might be as simple as change T from where T : Identity<T> to where T : IAggregateRoot<TIdentity> and use IAggregateRoot.Name instead of regex-parsing. But I have not yet investigated this, but I think it will be a major breaking change worthy of its own ticket.

Using Identity<T> is entirely optional and events are already bound to the identity of the aggregate. However, you have to ensure that the ids you are generating for each of your aggregate are unique.

Also, not sure if you've looked in-depth at Entity.cs but IIdentity is used in that area as well. So the scope of how Identity is used isn't just for identifying aggregates.

Dec 30 '20 21:12 WardenUnleashed

Hmm... I'm sorry, but am very confused by your answers. I must be misinterpreting you, for it seems that we are mostly talking about the same thing and are in agreement, but the wordings are wrong. Let me see if I can make it clearer.

Note that I wrote that it should be the aggregate type that keeps events separate from each other.

I'm inclined to disagree, in favor for the the Aggregate's identity instead.(which is how it currently works). Unfortunately your suggestion limits people's ability to roll their own identity systems. For example, if I wanted to make it an aggregate base that handled generating its own identity from a string, I could extend AggregateRoot and do just that!

My mistake; I was a bit unclear. I of course meant that the aggregate's type combined with the aggregate's identity should be the compound key (as in SQL) that segregate events in the persistence store.

Your example is exactly what I am trying to achieve but is unable to do because of the lack of type+identity-segregation. Our system has aggregate A with an int-based identity, but aggregate B also has an int-based identity. That's why we implemented our own IIdentity that simply converts the int to a string. But this cause collisions as described earlier and proven by the test.

One solution could be to extend our current aggregates with a new column for EventFlow-Identity<T>-based identities for millions of millions of records in an old system that has been running for years. To us it is very unappealing to do such an invasive modification and try to keep two identity-ranges in sync just for the sake of a third-party library.

Another solution would be to implement some kind of identity-converter from our own non-EventFlow-implicit-identity-rule-adherence to something that works in-process for EventFlow. However, that is an unnecessarily complex workaround for the IEventPersistence as IIdentity already is the language-supported mechanism for this.

They would only end up colliding if the same id was generated from the different Aggregate classes.

Agreed. That is the very exact reason for me creating this bug report.

However, this would violate the idea of uniqueness for aggregates.

I think this might be what is causing the confusion. I would appreciate some clarification in this matter. In the sentence immediately preceding your quoted one, you say "different aggregate classes" and that's the key.

Yes, aggregates are unique. However, their identity-values are only relevant within the scope of the aggregate type. Aggregates of different types may have the same identity-values, but the aggregates are still unique because they are of different types. The Voyage and Location aggregates may both have an identity of abc and still be unique, because they are of different types. The value abc for Voyage does not have the same meaning as abc for Location - they just happen to be the same binary sequences. There is no meaning in comparing the Voyage.Id == Location.Id, right? So why should the persistence suddenly make that kind of comparison?

Could you point me to any type of DDD-documentation that clearly supports your claim that all aggregates must have unique identity values across aggregate types? Because I can only find that the identity value must be unique within the aggregate type. I am not a DDD-expert, but I have not before encountered any problems with identities scoped by aggregate type inside the same domain. In this case of EventFlow, it is only a matter of the implementation of the persistence logic. In our other systems, we isolate aggregates by type so we have no such conflicts and I think most DDD-documentation supports this.

I've seen people implement a kind of GuidIdentity class and have all of their aggregates use that as their IIdentity class over Identity<T>. This solution wouldn't have any collision problems and I believe things would works just fine because that implementation always generate unique identities.

That's what Identity<T> does. But we would like to avoid creating new identities in our system just to facilitate persistence logic. We are trying to use identities that have a meaning in the business domain.

Using Identity<T> is entirely optional and events are already bound to the identity of the aggregate. However, you have to ensure that the ids you are generating for each of your aggregate are unique.

This is the root of the problem, isn't it? Your first sentence is, on itself, incorrect. But together with the second sentence, both are evidently correct. As was my third suggestion, this should be pointed out in the documentation. However, that would only side-step the root problem, which is that the optional Identity<T> is the only element in EventFlow assuming the responsibility to ensure the requirement of global uniqueness. A better solution that would completely nullify the root problem is to eliminate the requirement altogether by allowing the persistence stores to segregate by aggregate type. Which is what I am suggesting as my first solution.

Also, not sure if you've looked in-depth at Entity.cs but IIdentity is used in that area as well. So the scope of how Identity is used isn't just for identifying aggregates.

Actually, I was looking in to my second suggestion, which was to separate IIdentity from e.g. IAggregateIdentity, because, as you say, IIdentity is a language construct that is used for multiple purposes in EventFlow. Trying to separate and distinguish the two of them is a massive undertaking as they are very entangled. I would not want to do that because the risks are too high.

My proposal is still to make IEventPersistence aggregate type aware. That solution, apart from actually solving the problem, has the most additional benefits and the least risks.

Jan 04 '21 09:01 Reyhn3

Sorry for the delayed response!

Could you point me to any type of DDD-documentation that clearly supports your claim that all aggregates must have >unique identity values across aggregate types?

Sure! Those rules are defined in Domain Driven Design: Tackling Complexity of Hardware by Eric Evans.

Now, to translate that conceptual AGGREGATE into the implementation, we need a set of rules to apply to all transactions.

The root ENTITY has global identity and is ultimately responsible for checking invariants.

Root ENTITIES have global identity. ENTITIES inside the boundary have local identity, unique only within the AGGREGATE.

Eric Evans

To me global identity means a unique identity, especially when I think about it in context of the advice he gives when modeling entities

When an object is distinguished by its identity, rather than its attributes, make this primary to its definition in the model. Keep the class definition simple and focused on life cycle continuity and identity. Define a means of distinguishing each object regardless of its form or history. Be alert to requirements that call for matching objects by attributes. Define an operation that is guaranteed to produce a unique result for each object, possibly by attaching a symbol that is guaranteed unique. This means of identification may come from the outside, or it may be an arbitrary identifier created by and for the system, but it must correspond to the identity distinctions in the model. The model must define what it means to be the same thing.

Eric Evans

Having a global identity for entities(and aggregate roots by extension) was stressed pretty hard in the book.

That's what Identity<T> does. But we would like to avoid creating new identities in our system just to facilitate persistence logic. We are trying to use identities that have a meaning in the business domain.

Is having a base class that all identities extend off of not a viable option?

Jan 22 '21 05:01 WardenUnleashed

Hello there!

We hope you are doing well. We noticed that this issue has not seen any activity in the past 90 days. We consider this issue to be stale and will be closing it within the next seven days.

If you still require assistance with this issue, please feel free to reopen it or create a new issue.

Thank you for your understanding and cooperation.

Best regards, EventFlow

Apr 08 '23 13:04 github-actions[bot]

Hello there!

This issue has been closed due to inactivity for seven days. If you believe this issue still needs attention, please feel free to open a new issue or comment on this one to request its reopening.

Thank you for your contribution to this repository.

Best regards, EventFlow

Apr 16 '23 09:04 github-actions[bot]