bety icon indicating copy to clipboard operation
bety copied to clipboard

Import (sync) mechanism does not maintain database invariants.

Open gsrohde opened this issue 8 years ago • 2 comments

The load.bety.sh script circumvents database triggers (which enforce constraints) by using doing ALTER TABLE ${T} DISABLE TRIGGER ALL; before adding new rows from the server being imported from. Re-enabling the triggers doesn't check that invariants still hold. While value constraints are most likely maintained (assuming they are maintained on the server being imported from) foreign-key constraints and uniqueness constraints may not hold after the import is done.

For example, machine 0 (ebi-forecast) contains the citations_sites rows

bety=# select * from citations_sites where id in (1272,1273);
 citation_id |  site_id   |         created_at         |         updated_at         |  id  
-------------+------------+----------------------------+----------------------------+------
         844 | 1000000191 | 2015-10-26 17:36:01.566832 | 2015-10-26 17:36:01.566832 | 1272
         845 | 1000000191 | 2015-10-26 17:38:28.2554   | 2017-04-05 04:01:19.780502 | 1273

If a user imports from this machine but does not also import sites from machine 1, the side_id 1000000191 will likely refer to a row that doesn't exist.

UPDATE: Since uniqueness constraints are implemented via indexes, they seem to be unaffected by the "DISABLE TRIGGER" command. So it is probably mainly foreign-key constraints that are at issue.

gsrohde avatar Apr 07 '17 20:04 gsrohde

This more due to the fact that you only load half the database effectively.

robkooper avatar Nov 27 '17 18:11 robkooper

Off of the top of my head, it would seem that if the maintainer of machine X imports from machine Y, and if the maintainer of machine Y has imported from machines A, B, and C, then there is no guarantee that importing from machine Y won't create dangling foreign-key references unless maintainer of X imports from A, B, and C as well.

And of course this is recursive: if machine A has imported from machines D, E, and F, then importing from Y may entail importing from D, E, and F also.

You'd know more than I how much of a problem this is in practice. But aside from the question of whether this sometimes causes imports to fail, it is also inconvenient from a programming perspective not being able to assume that database invariants actually hold true.

gsrohde avatar Nov 27 '17 19:11 gsrohde