foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Propagate rv to tLogs on version vector recovery

Open dlambrig opened this issue 1 year ago • 92 comments

This PR fixes many of the "quiet database" errors in simulation tests running version vector, caused by peeks waiting for versions that never arrive during recovery.

in version vector, a locked tLog may be requested for a version beyond what the tLog has. During recovery tLogs are peeked up to the RV, but this can higher than logData->version, as tLogs in version vector advance at different rates.

To avoid deadlocking, return with the end version received in the request, which is the RV if set to a valid version during recovery. Do not do this if the version is not set or is not an valid version. For example, the SS may also be peeking tLogs during recovery, but they may not yet know the RV; they set the end version to infinity.

The PR modifies log routers instantiated for recovery to receive the RV in their initiation message, as do tLogs. This ensures they send a valid end version (RV) during recovery in peek messages.

This PR removes the previous solution, in which RPCs were sent to tLogs during recovery with the RV.

Joshua 20241008-140414-dlambrig-a9091fcbb91cc075 (commit 7)

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • [x] The PR has a description, explaining both the problem and the solution.
  • [x] The description mentions which forms of testing were done and the testing seems reasonable.
  • [x] Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

dlambrig avatar Sep 23 '24 15:09 dlambrig

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:06:24
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 15:09 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:11:09
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 15:09 foundationdb-ci

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:16:50
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 15:09 foundationdb-ci

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:07:00
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 15:09 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:11:32
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:15:21
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:53:04
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:53:51
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 0:54:52
  • Result: :x: FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 16f7fb217dd100c574a9baf19563663ccc468ae4
  • Duration 1:05:52
  • Result: :x: FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:56:01
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:57:06
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 0:57:18
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 16:09 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: be140c5c2602c6bf7ef9c867a3ff7b19f425024d
  • Duration 1:07:23
  • Result: :x: FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 17:09 foundationdb-ci

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:07:35
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:12:32
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:17:57
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:50:14
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:52:10
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 0:56:18
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Sep 23 '24 18:09 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: bcad4ec0b265dcd93dbfbcdb646a92895b8ab944
  • Duration 1:05:32
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 19:09 foundationdb-ci

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 0:06:31
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 20:09 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 0:11:14
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 20:09 foundationdb-ci

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 0:21:17
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 20:09 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 0:50:59
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 20:09 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 0:59:00
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Sep 23 '24 20:09 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 1:13:37
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 21:09 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 4c24394e161a5210998651d619068d9956346278
  • Duration 1:16:05
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 21:09 foundationdb-ci

Does this PR handle all (known) race conditions (or, are there any cases that need follow up PRs)? Thanks!

sbodagala avatar Sep 23 '24 21:09 sbodagala

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: a33504aee44d6f93cf93f6fd35c26f5ea155641b
  • Duration 0:21:17
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Sep 23 '24 22:09 foundationdb-ci