Support incremental migrations

Open arlobelshee opened this issue 1 year ago • 0 comments

Is your feature request related to a problem?

I frequently encounter 4 problems:

It is difficult to manage non-simultaneous release for multiple applications that share the same database.
It is difficult to manage schema change in a branch (even supabase DB branches has some problems), especially with multiple release lifecycle branches.
I can't do Ambler & Sadalge-style DB refactorings.
If I make a mistake with a migration, recovery is messy & error-prone.

Describe the solution you'd like

I want to have multi-version schemata. Ie, a database that supports any application that expects any schema version in a range of versions. This can be supported by extending the capabilities for migrations.

Specifically, each migration would:

Represent the transition between 2 specific versions (A->B, then B->C, ...), just like today.
Support 3-4 operations in place of today's 1. Each runs a different script supplied by the dev and performs some version shift:
- Apply: A->B' (to the CLI, this is today's Apply)
- Unapply: B'->A
- Finish: B'->B
- (optional) Unfinish: B->B' (used only to incrementally implement Finish).

Where version B' = union (A, B), such that it can be queried fully as if it were A and can be queried fully as if it were B.

Thus, a DB can support a version range [N, M] by being in state M' (base N), where that is union(M-1' (base N), M) => union(N, N+1, ..., M-1, M). This cannot be chained indefinitely (performance suffers with some composed transforms), but it can be chained for a very long time - long enough for real-world use at scale. The development team can advance the top and bottom of the range independently.

Describe alternatives you've considered

This can all be managed by splitting each migration into 2 migrations (###_Apply_Foo, ###+1_Finish_Foo). But there are some limits (see additional context).

Additional context

I would use this capability as follows, which would solve all of my above problems:

Local incremental migration writing:

Write some of the Apply script (first pass with diff, then tweak in later passes).
Extend Unapply to match changes in Apply (drop stuff I added; re-create stuff I dropped)
Run Apply.
Identify the differences between actual and desire.
Run Unapply.
Loop back to 1.

Then, do the same loop with Finish & Unfinish.

CI (with DB branches):

On commit to branch: Run Apply on the CI DB.
On merge to other branches: Run Apply on that branch DB.
- If only one application, or if this is the last application to merge this migration to this branch: Run Finish on that branch DB.
On merge to main: same, but on production.

CI (without DB branches):

On commit to branch: Run Apply on prod.
- Note: this requires a change in the migration list check, as prod will often have migrations from other branches Applied but not Finished.
On merge to main by last application:
- Run Finish on prod.

To support DB refactoring:

Like CI without branches, but can do Finish by some means other than CI.

To support making mistakes:

If a migration is committed and then found to be a problem a few commits later, I can update the migration (Apply and Unapply) and notify the users of the feature branch (there are few of us). We then:
1. Unapply all migrations back to the error (usually this is just the last one, but can be further).
2. Apply all migrations in the branch.
Poof! It's as if I never made the error.

I also use this technique to allow me to explore multiple ideas in parallel, using probably_wrong/ branches. Version history changes all the time on these branches & I don't want each to have a branch DB - just to all be with local. I'd like to maintain DB state between branch changes, as I'm often debugging something. So I just want to Unapply and then Apply back and forth.

Problems with using separate Finish migrations:

No support for Unapply, so doesn't solve incremental development of the migration (with commits), nor making mistakes or debugging with probably_wrong/ branches.
Need to keep the Finish migrations around on the feature branch (build incrementally), but not run them. So must initially create each with a name that won't be found, then remember to do a rename as part of PR accept / merge to main.
DB (without branches) will still complain about having had migrations for which there is no local code. So will have to cherry-pick migrations between feature branches whenever they are applied.

This is actually easier if doing full DB refactorings for multiple applications, as generally Apply_Foo and Finish_Foo will be done in different feature branches. However, then you have to remember to write Finish_Foo when you need it (and go re-learn what was done in Apply and what was left for Finish), or you need to write it initially in a file hidden from Supabase migrate, then rename it as part of the refactoring finish.

Refactoring Databases book

Jul 16 '24 10:07 arlobelshee