cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

scbuildstmt: ALTER PK drops rowid column when possible

Open postamar opened this issue 3 years ago • 5 comments

This commit improves the ALTER TABLE ... ALTER PRIMARY KEY ... implementation in the declarative schema changer by also having it drop the hidden rowid column when possible.

This required adding a missing rule which ensures that dropping columns only takes place after the indexes in which it's a key column are all also dropping.

Fixes #85814.

Release note (sql change): The implicitly-created rowid column which serves as PK on tables with no explicit PK now gets removed by an ALTER PRIMARY KEY statement when executed by the declarative schema changer and when there are no references to it elsewhere.

postamar avatar Aug 10 '22 14:08 postamar

This change is Reviewable

cockroach-teamcity avatar Aug 10 '22 14:08 cockroach-teamcity

cc @Xiang-Gu @ajwerner @ajstorm

postamar avatar Aug 10 '22 14:08 postamar

This error in TestPrimaryKeyChangeWithOperations is reproducible locally under stress. Obviously, something's wrong with our plan, but I'm struggling to make sense of it. The nullness errors are thrown when writing to the old primary index. I don't even understand how that is possible when the schema change is supposed to be blocked at the backfill.

postamar avatar Aug 10 '22 21:08 postamar

Gaah I keep forgetting that write-only isn't as strict as it sounds and actually allows reading data when it's for the purpose of doing a write, so to speak (because otherwise how do you do conditional puts). DELETE_ONLY it is. I think I know what to do. Thanks!

postamar avatar Aug 11 '22 01:08 postamar

Hmm, that doesn't seem to have done the trick. Let's look at this together tomorrow.

postamar avatar Aug 11 '22 02:08 postamar

I added what I thought was a decent rule but I'm still getting

--- FAIL: TestPrimaryKeyChangeWithOperations (5.60s)
    test_log_scope.go:162: test logs captured to: /Users/mariusposta/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/484b17052552acaa61586f932b2ab60e/logTestPrimaryKeyChangeWithOperations675002425
    test_log_scope.go:80: use -show-logs to present logs inline
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:370: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:379: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:379: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:379: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:379: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:379: pq: null value in column "crdb_internal_column_3_name_placeholder" violates not-null constraint
    schema_changer_test.go:338: schema change ALTER TABLE t.test ALTER PRIMARY KEY USING COLUMNS (k) took 230.539125ms
    schema_changer_test.go:385: pq: duplicate key value violates unique constraint "test_pkey"
    schema_changer_test.go:385: pq: duplicate key value violates unique constraint "test_pkey"
    schema_changer_test.go:385: pq: duplicate key value violates unique constraint "test_pkey"
    schema_changer_test.go:385: pq: duplicate key value violates unique constraint "test_pkey"
    schema_changer_test.go:385: pq: duplicate key value violates unique constraint "test_pkey"
    schema_changer_test.go:2992: -- test log scope end --
FAIL

For reference here's the EXPLAIN plan I'm getting:

  Schema change plan for ALTER TABLE ‹defaultdb›.public.‹test› ALTER PRIMARY KEY USING COLUMNS (‹k›);
   ├── StatementPhase
   │    └── Stage 1 of 1 in StatementPhase
   │         ├── 1 element transitioning toward ABSENT
   │         │    └── PUBLIC → WRITE_ONLY    Column:{DescID: 104, ColumnID: 3}
   │         ├── 5 elements transitioning toward PUBLIC
   │         │    ├── ABSENT → PUBLIC        IndexColumn:{DescID: 104, ColumnID: 1, IndexID: 2}
   │         │    ├── ABSENT → PUBLIC        IndexColumn:{DescID: 104, ColumnID: 2, IndexID: 2}
   │         │    ├── ABSENT → BACKFILL_ONLY PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │         │    ├── ABSENT → PUBLIC        IndexColumn:{DescID: 104, ColumnID: 1, IndexID: 3}
   │         │    └── ABSENT → PUBLIC        IndexColumn:{DescID: 104, ColumnID: 2, IndexID: 3}
   │         ├── 1 element transitioning toward TRANSIENT_ABSENT
   │         │    └── ABSENT → DELETE_ONLY   TemporaryIndex:{DescID: 104, IndexID: 3, SourceIndexID: 1}
   │         └── 8 Mutation operations
   │              ├── MakeDroppedColumnDeleteAndWriteOnly {"ColumnID":3,"TableID":104}
   │              ├── LogEvent {"TargetStatus":1}
   │              ├── MakeAddedIndexBackfilling {"Index":{"ConstraintID":1,"IndexID":2,"IsUnique":true,"SourceIndexID":1,"TableID":104,"TemporaryIndexID":3}}
   │              ├── MakeAddedTempIndexDeleteOnly {"Index":{"ConstraintID":1,"IndexID":3,"IsUnique":true,"SourceIndexID":1,"TableID":104}}
   │              ├── AddColumnToIndex {"ColumnID":1,"IndexID":3,"TableID":104}
   │              ├── AddColumnToIndex {"ColumnID":2,"IndexID":3,"Kind":2,"TableID":104}
   │              ├── AddColumnToIndex {"ColumnID":1,"IndexID":2,"TableID":104}
   │              └── AddColumnToIndex {"ColumnID":2,"IndexID":2,"Kind":2,"TableID":104}
   ├── PreCommitPhase
   │    └── Stage 1 of 1 in PreCommitPhase
   │         └── 2 Mutation operations
   │              ├── SetJobStateOnDescriptor {"DescriptorID":104,"Initialize":true}
   │              └── CreateSchemaChangerJob {"RunningStatus":"PostCommitPhase ..."}
   ├── PostCommitPhase
   │    ├── Stage 1 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward TRANSIENT_ABSENT
   │    │    │    └── DELETE_ONLY → WRITE_ONLY TemporaryIndex:{DescID: 104, IndexID: 3, SourceIndexID: 1}
   │    │    └── 3 Mutation operations
   │    │         ├── MakeAddedIndexDeleteAndWriteOnly {"IndexID":3,"TableID":104}
   │    │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
   │    │         └── UpdateSchemaChangerJob {"RunningStatus":"PostCommitPhase ..."}
   │    ├── Stage 2 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward PUBLIC
   │    │    │    └── BACKFILL_ONLY → BACKFILLED PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │    │    └── 1 Backfill operation
   │    │         └── BackfillIndex {"IndexID":2,"SourceIndexID":1,"TableID":104}
   │    ├── Stage 3 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward PUBLIC
   │    │    │    └── BACKFILLED → DELETE_ONLY PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │    │    └── 3 Mutation operations
   │    │         ├── MakeBackfillingIndexDeleteOnly {"IndexID":2,"TableID":104}
   │    │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
   │    │         └── UpdateSchemaChangerJob {"RunningStatus":"PostCommitPhase ..."}
   │    ├── Stage 4 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward PUBLIC
   │    │    │    └── DELETE_ONLY → MERGE_ONLY PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │    │    └── 3 Mutation operations
   │    │         ├── MakeBackfilledIndexMerging {"IndexID":2,"TableID":104}
   │    │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
   │    │         └── UpdateSchemaChangerJob {"RunningStatus":"PostCommitPhase ..."}
   │    ├── Stage 5 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward PUBLIC
   │    │    │    └── MERGE_ONLY → MERGED PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │    │    └── 1 Backfill operation
   │    │         └── MergeIndex {"BackfilledIndexID":2,"TableID":104,"TemporaryIndexID":3}
   │    ├── Stage 6 of 7 in PostCommitPhase
   │    │    ├── 1 element transitioning toward PUBLIC
   │    │    │    └── MERGED → WRITE_ONLY PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │    │    └── 3 Mutation operations
   │    │         ├── MakeMergedIndexWriteOnly {"IndexID":2,"TableID":104}
   │    │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
   │    │         └── UpdateSchemaChangerJob {"RunningStatus":"PostCommitPhase ..."}
   │    └── Stage 7 of 7 in PostCommitPhase
   │         ├── 1 element transitioning toward PUBLIC
   │         │    └── WRITE_ONLY → VALIDATED PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
   │         └── 1 Validation operation
   │              └── ValidateUniqueIndex {"IndexID":2,"TableID":104}
   └── PostCommitNonRevertiblePhase
        ├── Stage 1 of 4 in PostCommitNonRevertiblePhase
        │    ├── 2 elements transitioning toward ABSENT
        │    │    ├── WRITE_ONLY → DELETE_ONLY           Column:{DescID: 104, ColumnID: 3}
        │    │    └── PUBLIC     → ABSENT                ColumnName:{DescID: 104, Name: rowid, ColumnID: 3}
        │    ├── 1 element transitioning toward TRANSIENT_ABSENT
        │    │    └── WRITE_ONLY → TRANSIENT_DELETE_ONLY TemporaryIndex:{DescID: 104, IndexID: 3, SourceIndexID: 1}
        │    └── 5 Mutation operations
        │         ├── MakeDroppedColumnDeleteOnly {"ColumnID":3,"TableID":104}
        │         ├── SetColumnName {"ColumnID":3,"Name":"crdb_internal_co...","TableID":104}
        │         ├── MakeDroppedIndexDeleteOnly {"IndexID":3,"TableID":104}
        │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
        │         └── UpdateSchemaChangerJob {"IsNonCancelable":true,"RunningStatus":"PostCommitNonRev..."}
        ├── Stage 2 of 4 in PostCommitNonRevertiblePhase
        │    ├── 5 elements transitioning toward ABSENT
        │    │    ├── PUBLIC    → ABSENT     IndexColumn:{DescID: 104, ColumnID: 3, IndexID: 1}
        │    │    ├── PUBLIC    → ABSENT     IndexColumn:{DescID: 104, ColumnID: 1, IndexID: 1}
        │    │    ├── PUBLIC    → ABSENT     IndexColumn:{DescID: 104, ColumnID: 2, IndexID: 1}
        │    │    ├── PUBLIC    → WRITE_ONLY PrimaryIndex:{DescID: 104, IndexID: 1, ConstraintID: 1}
        │    │    └── PUBLIC    → ABSENT     IndexName:{DescID: 104, Name: test_pkey, IndexID: 1}
        │    ├── 2 elements transitioning toward PUBLIC
        │    │    ├── VALIDATED → PUBLIC     PrimaryIndex:{DescID: 104, IndexID: 2, ConstraintID: 1, TemporaryIndexID: 3, SourceIndexID: 1}
        │    │    └── ABSENT    → PUBLIC     IndexName:{DescID: 104, Name: test_pkey, IndexID: 2}
        │    └── 9 Mutation operations
        │         ├── MakeDroppedPrimaryIndexDeleteAndWriteOnly {"IndexID":1,"TableID":104}
        │         ├── SetIndexName {"IndexID":1,"Name":"crdb_internal_in...","TableID":104}
        │         ├── SetIndexName {"IndexID":2,"Name":"test_pkey","TableID":104}
        │         ├── RemoveColumnFromIndex {"ColumnID":3,"IndexID":1,"TableID":104}
        │         ├── RemoveColumnFromIndex {"ColumnID":1,"IndexID":1,"Kind":2,"TableID":104}
        │         ├── RemoveColumnFromIndex {"ColumnID":2,"IndexID":1,"Kind":2,"Ordinal":1,"TableID":104}
        │         ├── MakeAddedPrimaryIndexPublic {"IndexID":2,"TableID":104}
        │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
        │         └── UpdateSchemaChangerJob {"IsNonCancelable":true,"RunningStatus":"PostCommitNonRev..."}
        ├── Stage 3 of 4 in PostCommitNonRevertiblePhase
        │    ├── 1 element transitioning toward ABSENT
        │    │    └── WRITE_ONLY → DELETE_ONLY PrimaryIndex:{DescID: 104, IndexID: 1, ConstraintID: 1}
        │    └── 3 Mutation operations
        │         ├── MakeDroppedIndexDeleteOnly {"IndexID":1,"TableID":104}
        │         ├── SetJobStateOnDescriptor {"DescriptorID":104}
        │         └── UpdateSchemaChangerJob {"IsNonCancelable":true,"RunningStatus":"PostCommitNonRev..."}
        └── Stage 4 of 4 in PostCommitNonRevertiblePhase
             ├── 4 elements transitioning toward ABSENT
             │    ├── DELETE_ONLY           → ABSENT           Column:{DescID: 104, ColumnID: 3}
             │    ├── PUBLIC                → ABSENT           ColumnType:{DescID: 104, ColumnFamilyID: 0, ColumnID: 3}
             │    ├── PUBLIC                → ABSENT           ColumnDefaultExpression:{DescID: 104, ColumnID: 3}
             │    └── DELETE_ONLY           → ABSENT           PrimaryIndex:{DescID: 104, IndexID: 1, ConstraintID: 1}
             ├── 1 element transitioning toward TRANSIENT_ABSENT
             │    └── TRANSIENT_DELETE_ONLY → TRANSIENT_ABSENT TemporaryIndex:{DescID: 104, IndexID: 3, SourceIndexID: 1}
             └── 8 Mutation operations
                  ├── CreateGcJobForIndex {"IndexID":3,"TableID":104}
                  ├── MakeIndexAbsent {"IndexID":3,"TableID":104}
                  ├── CreateGcJobForIndex {"IndexID":1,"TableID":104}
                  ├── MakeIndexAbsent {"IndexID":1,"TableID":104}
                  ├── RemoveColumnDefaultExpression {"ColumnID":3,"TableID":104}
                  ├── MakeColumnAbsent {"ColumnID":3,"TableID":104}
                  ├── RemoveJobStateFromDescriptor {"DescriptorID":104}
                  └── UpdateSchemaChangerJob {"IsNonCancelable":true,"RunningStatus":"all stages compl..."}

(1 row)

Considering that the placeholder column name is mentioned in the test errors, I'm guessing that the error occurred after the first non-revertible txn commits, at which point the column should be DELETE_ONLY.

postamar avatar Aug 11 '22 18:08 postamar

The debugger tells me these errors happen in UpdateRow on the old primary index just after if was swapped out. I guess any UPDATE at that point needs to update both the old and the new primary indexes, but chokes on updating the old because it doesn't fetch the rowid value for some reason. I have no idea how this fetching works but if it fetches it in the new primary index then none of this can work.

The solution might be to do the ALTER PRIMARY KEY in two passes, one which replaces it but keeps rowid as a stored column, and another where the column is dropped. This means two backfills, which just feels really dumb. There has to be a better way.

postamar avatar Aug 11 '22 18:08 postamar

This PR uses a transient primary index in ALTER PRIMARY KEY when dropping the rowid column. The plan looks good from what I can tell, and the schema changer is happy as a clam. Unfortunately, TestPrimaryKeyChangeWithOperations still fails under stress. Stepping through the debugger I noticed that the descriptor version with the nullness error has the transient index as primary (with the new PK and rowid stored, as it should be) but, like previously, the fetched value for rowid was null again, which the original primary index doesn't like... This falls out of my area of expertise, could someone else please take a look? Also, I'm pretty beat.

Assuming that the above isn't a dealbreaker, I feel this PR would be ready for a round of review. Needless to say this PR is best reviewed commit-by-commit:

  • The first 5 commits are fairly self-contained changes in support of the others. I could tighten them a bit if anyone really cares.
  • The penultimate commit introduces a convenient way to opgen a transient path from public+absent, and contains a functionally-quasi-equivalent rewrite of the existing rules.
  • The final commit changes the behaviour of ALTER PRIMARY KEY to drop the rowid column. Annoyingly, to do so, we also need to add a transient primary indexed keyed by the new PK but containing rowid as a stored column. Gotta love having nearly 20 transactions to swap a PK...

postamar avatar Aug 12 '22 09:08 postamar

I've rebased and added a rule which makes the old primary index reach absent before the new primary index is public, when a third (transient) primary index is involved. The TestPrimaryKeyChangeWithOperations test still fails, unfortunately.

This required getting rid of the "all indexes reach absent in the same stage" rule, which is too bad but is probably bad anyway (let's model the index GC jobs as elements instead, and force them to transition in the same stage).

postamar avatar Aug 12 '22 17:08 postamar

This came along nicely. I'm closing this in favour of https://github.com/cockroachdb/cockroach/pull/86071, which is the same thing minus one or two commits, and minus the flakes, and with updated commit messages. Considering the amount of scrutiny the commits in this stack have been given over the last 72 hours by two engineers, they should be considered as having been reviewed.

postamar avatar Aug 12 '22 23:08 postamar