booster icon indicating copy to clipboard operation
booster copied to clipboard

How to handle command validations based on database values

Open LaiaPerez88 opened this issue 4 years ago • 3 comments

Due to the asynchronous nature it's not possible to be sure that Booster.entity() and Booster.readModel() return the latest data as newer events can still be "in flight". Sometimes validation is necessary. Yet another example with references but slightly different: An invoice application generates incremental human-readable invoice numbers (let's say based on the approach from the previous question). However, a user should still be able to update existing invoice numbers if needed. Functionally, here is a unique constraint on the invoice number. A user now updates an invoice number while a new invoice is being created by a colleague. Two commands are received in parallel and thus no knowledge can yet be extracted from entities or read models to validate that a duplicate invoice number would be created.

Possible solutions:

  • UpdateDossierReferenceandCreateDossiercommands should only run sequentially.CreateDossiercommands can still run in parallel to assure fast bulk inserts, but once 1UpdateDossierReferenceis executed no otherUpdateDossierReferenceandCreateDossiercommands should be run. Next to that the ongoing command should also being able to check no events are still in flight to be sure thatBooster.entity()` returns the latest reference.

  • Alternatively, we can accept that validation is not possible in the first place and two events with the same reference number will be fired. An event handler could then handle these events one by one. The second event should somehow be discarded/cancelled, otherwise two events referring to the same reference number reach the database...

Another example is a command DeleteDossier which triggers an event DossierDeleted. When triggering the command twice in parallel, two DossierDeleted events will be stored in the database. It will not impact the entities and read models, but it will impact the list of events, causing pollution and also functional problems when using the events as a visual "audit trail" in the application.

LaiaPerez88 avatar Jul 27 '21 16:07 LaiaPerez88

@thomas-advantitge, The best you can use to handle this is to use Booster.entity() to read data from your database and do any validation you want. Take into account that since the moment you read your entity and publish new events, other events for the same entity may have arrived and changed their value. What we want to say with this is that, right now, we don’t have any transactional mechanism as Booster is eventually consistent by nature.

A solution that would be 100% consistent for your use case would be to generate the incremental human-readable invoice number in a reducer. Reducers are executed serially not matter how many events are generated in parallel. The drawback here is that you can’t return errors to the users but I think this is not a problem for your use case. As a side note: It is highly improbable that two events fire at the same time, especially because Booster tags events internally to the millisecond. But related to the use case in the first question, the DossierCount entity will be in charge of the reference count. Maybe modification could be prevented by adding a field with the latest event that accessed it, so commands can be retried from an event handler in the case it was an update.

Regarding event store pollution, Booster doesn’t deal with that right now, but from the point of view of this use case, I don’t see what’s wrong with noting down two DossierDeleted events. From a performance point of view, I’m not sure if locking down the system to ensure that deletion was unique in the events part would be great.

LaiaPerez88 avatar Jul 27 '21 16:07 LaiaPerez88

Related to the last part concerning event store pollution: indeed, locking down the system doesn't seem like a solution and feels in contradiction with the Booster concepts.

Downsides of having duplicate DossierDeleted events:

  • When using events for auditing, let's say in a timeline in a web application showing the whole history, multiple (duplicate) items would be visible in this timeline. They can be combined, but it feels like patching something which shouldn't happen in the first place.
  • Events could be used for reporting. BI tools and spreadsheets could really benefit from the power of having fine grained information instead of just "entity" information. E.g.: the activity in a dossier could be measured by counting the amount of update events. The amount of deleted dossiers could also be tracked. These numbers will be inconsistent if a user is capable of inserting duplicate events.

Maybe having an optional unique identifier/constraint could be a solution to prevent duplicate inserts? Of course this will probably not be a problem in most cases, these kind of architectural questions were raised related to a migration/import of 200k+ entities from a legacy system.

thomas-advantitge avatar Jul 28 '21 13:07 thomas-advantitge

An idea borrowed from actor model. In an actor world, every actor has a FIFO queue named mailbox. The mailbox solves the data races. For aws, we may create a FIFO queue, then make the group id of a message equal an entity's id.

purefun avatar Aug 09 '21 11:08 purefun