ydb icon indicating copy to clipboard operation
ydb copied to clipboard

Fix stale read of some acknowledged writes after a table split

Open snaury opened this issue 2 years ago • 3 comments

Changelog entry

Fix stale read of some acknowledged writes after a table split.

Changelog category

  • Bugfix

Additional information

Some G-single-item-realtime anomalies were detected with Jepsen, which corresponded to a stale read immediately after a table split. Investigation showed several cases where a single-shard write could be acknowledged to clients by source shard, when destination shards would consider those versions unacknowledged due to lagging mediator time. The underlying races caused several unintended side-effects:

  • Destination shards could attach to mediator time before they had all the necessary information:
    • Non-repeatable reads: destination shards could select a new write version which was supposed to be frozen by a repeatable snapshot read at their source shard during split
    • Stale reads: destination shards could select a new read version which was acknowledged by a single-shard write at their source shard during split
  • Source shards could reply to writes after destination shards have fully initialized, which could cause stale reads due to mediator time lagging at their corresponding nodes

These issues are fixed by not starting mediator time restore until all snapshots are received by destination shards (this ensures destination shards await mediator time which is not less than the last theoretically observed by source shards at the time they sent their snapshots), and not sending delayed replies after a snapshot is prepared by source shards (this ensures destination shards may trust their local mediator time to determine write visibility).

Partially fixes KIKIMR-21065.

snaury avatar Feb 27 '24 15:02 snaury

:white_circle: 2024-02-27 15:57:09 UTC Pre-commit check for 1d8796c07b53cafb118c6324a33bc0476c855876 has started. :white_circle: 2024-02-27 15:57:10 UTC Build linux-x86_64-release-cmake14 is running... :green_circle: 2024-02-27 15:59:33 UTC Build successful.

github-actions[bot] avatar Feb 27 '24 15:02 github-actions[bot]

:white_circle: 2024-02-27 15:59:36 UTC Pre-commit check for 1d8796c07b53cafb118c6324a33bc0476c855876 has started. :white_circle: 2024-02-27 15:59:40 UTC Build linux-x86_64-relwithdebinfo is running... :green_circle: 2024-02-27 16:02:10 UTC Build successful. :white_circle: 2024-02-27 16:02:23 UTC Tests are running... :red_circle: 2024-02-27 17:23:17 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67969 57061 0 3 10883 22

github-actions[bot] avatar Feb 27 '24 15:02 github-actions[bot]

:white_circle: 2024-02-27 16:04:16 UTC Pre-commit check for 1d8796c07b53cafb118c6324a33bc0476c855876 has started. :white_circle: 2024-02-27 16:04:18 UTC Build linux-x86_64-release-asan is running... :green_circle: 2024-02-27 16:06:50 UTC Build successful. :white_circle: 2024-02-27 16:06:59 UTC Tests are running... :red_circle: 2024-02-27 17:48:43 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14877 14715 0 35 101 26

github-actions[bot] avatar Feb 27 '24 16:02 github-actions[bot]

The Cdc.ResolvedTimestamps test exposed a TxInit race at split dst, need to find a proper non-racy way to initialize post split.

snaury avatar Feb 28 '24 09:02 snaury

:white_circle: 2024-02-28 13:56:49 UTC Pre-commit check for 73264da73bffa315071a47804787b75077405999 has started. :white_circle: 2024-02-28 13:56:51 UTC Build linux-x86_64-release-asan is running... :green_circle: 2024-02-28 14:12:23 UTC Build successful. :white_circle: 2024-02-28 14:12:35 UTC Tests are running... :red_circle: 2024-02-28 15:54:23 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14879 14676 0 37 120 46

github-actions[bot] avatar Feb 28 '24 13:02 github-actions[bot]

:white_circle: 2024-02-28 13:56:50 UTC Pre-commit check for 73264da73bffa315071a47804787b75077405999 has started. :white_circle: 2024-02-28 13:56:52 UTC Build linux-x86_64-release-cmake14 is running... :green_circle: 2024-02-28 14:20:46 UTC Build successful.

github-actions[bot] avatar Feb 28 '24 13:02 github-actions[bot]

:white_circle: 2024-02-28 13:57:52 UTC Pre-commit check for 73264da73bffa315071a47804787b75077405999 has started. :white_circle: 2024-02-28 13:57:54 UTC Build linux-x86_64-relwithdebinfo is running... :green_circle: 2024-02-28 14:10:26 UTC Build successful. :white_circle: 2024-02-28 14:10:38 UTC Tests are running... :red_circle: 2024-02-28 15:42:12 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
68022 57071 0 9 10890 52

github-actions[bot] avatar Feb 28 '24 13:02 github-actions[bot]

Failing tests seem to be all KV/PQ, unrelated to datashards.

snaury avatar Feb 29 '24 08:02 snaury