DAOS-10037 cart: Replace swim D_CIRCLEQ
The crt_swim module uses a D_CIRCLEQ, essentially a linked list, to organize SWIM members. Since looking for an entry in a linked list is O(n), where n is the number of entries in the list, getting the SWIM state of a rank is a relatively expensive operation. This becomes a concern as we utilize SWIM states in more places, such as when canceling RPCs destined to dead ranks.
This patch replaces the D_CIRCLEQ with a hash table that stores the SWIM states and another array that stores a permutation of the SWIM member ranks. The former is indexed by swim_id_t and makes getting the SWIM state of a rank O(1); the latter retains the ability to shuffle SWIM members as required by the SWIM protocol. A few things to note:
-
The semantics of crt_swim_rank_add no longer involve randomization; the caller needs to call the new crt_swim_rank_shuffle for every batch of additions. This change improves the performance of adding a large number of ranks, like when initializing a group. The performance of adding a small number of ranks currently degrades, but should be benign.
- In the future, we should further improve the performance of crt_group_primary_modify by sorting beforehand.
-
Periodic shuffling is not done before and after this patch. This is left to future work if necessary.
Signed-off-by: Li Wei [email protected] Required-githooks: true
Bug-tracker data: Ticket title is 'Use SWIM info to cancel RPCs among engines' Status is 'In Progress' https://daosio.atlassian.net/browse/DAOS-10037
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9308/6/execution/node/145/log
Test stage Unit Test with memcheck completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9308/7/execution/node/643/log
F_H_L daos_test/dfs: DAOS-11236
This probably should be run against a later version of master branch since it's 19 days since last merge
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/12/display/redirect
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/12/display/redirect
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9308/13/display/redirect
This probably should be run against a later version of master branch since it's 19 days since last merge
@daos-stack/daos-gatekeeper, the request has been fulfilled.
Thanks, Ashley.