Postgres migrations: incomplete transaction handling
Abstract
For postgres, the migration script and the _sqlx_migrations state change run in two different transactions, which requires that the migration scripts are idempotent. This is quite hard to archive for certain scripts. It would be nice if both parts would run within the same transaction.
Technical Details
See this code here:
https://github.com/launchbadge/sqlx/blob/d9fd21c94e57db3e02bf2ef6b3a9cc94296ce64a/sqlx-core/src/postgres/migrate.rs#L217-L247
Now imagine that the process dies here:
https://github.com/launchbadge/sqlx/blob/d9fd21c94e57db3e02bf2ef6b3a9cc94296ce64a/sqlx-core/src/postgres/migrate.rs#L229
Now the migration script was executed and committed, but SQLx forgot about that fact. A subsequent migrator run will execute the script again.
References
I'm quite certain that I've observed that issue with InfluxDB IOx (commit 500ece7c138e6b4f04b6761c68bd447ad69a6845) which uses sqlx = 0.6.0 and postgres 13.6.
I do agree, it seems strange for the insert into _sqlx_migrations to happen outside of the transaction. I believe it's so that execution_time includes the transaction commit (which is where most of the work is done), but it also seems strange to store that in the database in the first place.
To keep execution_time while fixing this issue, I would do the insert into _sqlx_migrations inside the transaction with execution_time set to 0, then commit the transaction and update execution_time afterward.
To keep
execution_timewhile fixing this issue, I would do the insert into_sqlx_migrationsinside the transaction withexecution_timeset to 0, then commit the transaction and updateexecution_timeafterward.
That sounds like a good way to solve it. Would a nested transaction construct also work?
Postgres doesn't support nested transactions. We simulate them using savepoints, which wouldn't work for that.
There's PREAPRE TRANSACTION which does most of the work of committing a transaction without fully publishing the changes, but I think that would be more error prone than it's worth. Mainly, if the final COMMIIT PREPARED statement to publish the changes doesn't get executed then the transaction state will be left on disk, including any locks that it held. That could easily deadlock the database since modifying tables requires locking them.
Worst case scenario with the fix I described above, you have migrations with a recorded 0 execution time. Or maybe it should default to -1? The column is NOT NULL.
I don't see why migration need that execution time in first place. I just played with migration yesterday and felt awkward with that column. I always think that the execution time for a migration isn't that important, and it'll be difference with each computer run it too so not much point for store that value.