aleph icon indicating copy to clipboard operation
aleph copied to clipboard

FEATURE: Record and show the history of changes to an entity over time

Open ozhyrenkov opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. The problem to solve can be briefly described as data dynamics. There are one-time loads like leaks of different kinds, but there are also datasets which can be regularly updated, including, but not limiting to:

  • Companies registries from official resources.
  • Real estate lists including cadastral maps and so on.
  • Customs bases.

In any of the aforementioned, but also in datasets of other kinds the data might change. For the Intervals it works fine, like sequence of successors or owners\directors. But for the changes of attributes there are no clear way to trace the history of changes. Examples of such kind:

  • Renaming of entities: NGO Aleph changes to NGO Bet then to NGO Gimel then back to NGO Aleph without changes of company code;
  • Changes of legal form, legal or tax status: The Company A has tax payer status "Not a VAT payer" then has been a "VAT payer" for a Q1'21 because of anything;
  • There was a custom declaration 7 years ago, the court case on the amount of taxes ended and changes were made to the customs declaration like changes of HS-code and amount of taxes payed (Real-world case).

Describe the solution you'd like A clear and easy to use way to see and trace the history of changes made to the entity of any kind:

  • Being able to do a time-machine views at different point of time (extension to Timelines?);
  • The history of changes for the particular entity;
  • A clear way to specify within upload (alephclient) timestamp of changes made and fact that it was not a new entity, but rather changes into existing one.

Describe alternatives you've considered

  • Some sort of Slowly changing dimension Type 2 on the data level.
  • Better usage of Elasticsearch versioning capabilities.

Additional context I think that this supposed to be part not only of Aleph, but to some extend of followthemoney as a data standard. This can bring the whole ecosystem of FTM to a different level of analytics.

ozhyrenkov avatar Jun 28 '22 14:06 ozhyrenkov

FWIW the way I've been doing this in OpenSanctions (on top of followthemoney is by using statement-based claim storage). This works well but a) I'm not sure it would scale to anywhere near where Aleph is in terms of data volume (the current OS db is 7.4mn statements for 220000 entities), and b) it does lead to a weird impedance when we merge it back down into normal FtM.

More info: https://www.opensanctions.org/docs/statements/

Maybe one day there could be qualified values in FtM, that also have metadata (per prop value) regarding source, timestamps, language, and even quality rank like in Wikidata.

pudo avatar Jun 30 '22 08:06 pudo