daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-10037 cart: Replace swim D_CIRCLEQ

Open liw opened this issue 3 years ago • 4 comments

The crt_swim module uses a D_CIRCLEQ, essentially a linked list, to organize SWIM members. Since looking for an entry in a linked list is O(n), where n is the number of entries in the list, getting the SWIM state of a rank is a relatively expensive operation. This becomes a concern as we utilize SWIM states in more places, such as when canceling RPCs destined to dead ranks.

This patch replaces the D_CIRCLEQ with a hash table that stores the SWIM states and another array that stores a permutation of the SWIM member ranks. The former is indexed by swim_id_t and makes getting the SWIM state of a rank O(1); the latter retains the ability to shuffle SWIM members as required by the SWIM protocol. A few things to note:

  • The semantics of crt_swim_rank_add no longer involve randomization; the caller needs to call the new crt_swim_rank_shuffle for every batch of additions. This change improves the performance of adding a large number of ranks, like when initializing a group. The performance of adding a small number of ranks currently degrades, but should be benign.

    • In the future, we should further improve the performance of crt_group_primary_modify by sorting beforehand.
  • Periodic shuffling is not done before and after this patch. This is left to future work if necessary.

Signed-off-by: Li Wei [email protected] Required-githooks: true

liw avatar Jun 09 '22 09:06 liw

Bug-tracker data: Ticket title is 'Use SWIM info to cancel RPCs among engines' Status is 'In Progress' https://daosio.atlassian.net/browse/DAOS-10037

github-actions[bot] avatar Aug 03 '22 08:08 github-actions[bot]

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9308/6/execution/node/145/log

daosbuild1 avatar Aug 03 '22 08:08 daosbuild1

Test stage Unit Test with memcheck completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9308/7/execution/node/643/log

daosbuild1 avatar Aug 03 '22 09:08 daosbuild1

F_H_L daos_test/dfs: DAOS-11236

liw avatar Aug 05 '22 00:08 liw

This probably should be run against a later version of master branch since it's 19 days since last merge

jolivier23 avatar Sep 02 '22 16:09 jolivier23

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/12/display/redirect

daosbuild1 avatar Sep 03 '22 00:09 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/12/display/redirect

daosbuild1 avatar Sep 03 '22 00:09 daosbuild1

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/13/display/redirect

daosbuild1 avatar Sep 03 '22 00:09 daosbuild1

This probably should be run against a later version of master branch since it's 19 days since last merge

@daos-stack/daos-gatekeeper, the request has been fulfilled.

liw avatar Sep 05 '22 00:09 liw

Thanks, Ashley.

liw avatar Sep 05 '22 08:09 liw