DAOS-11626 test: Adding MD on SSD metrics tests
Adding tests for WAL commit, reply, and checkpoint metrics.
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Allow-unstable-test: true Test-tag: WalMetrics
Required-githooks: true
Before requesting gatekeeper:
- [ ] Two review approvals and any prior change requests have been resolved.
- [ ] Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
- [ ]
Features:(orTest-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR. - [ ] Commit messages follows the guidelines outlined here.
- [ ] Any tests skipped by the ticket being addressed have been run and passed in the PR.
Gatekeeper:
- [ ] You are the appropriate gatekeeper to be landing the patch.
- [ ] The PR has 2 reviews by people familiar with the code, including appropriate watchers.
- [ ] Githooks were used. If not, request that user install them and check copyright dates.
- [ ] Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
- [ ] All builds have passed. Check non-required builds for any new compiler warnings.
- [ ] Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
- [ ] If applicable, the PR has addressed any potential version compatibility issues.
- [ ] Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
- [ ] Extra checks if forced landing is requested
- [ ] Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
- [ ] No new NLT or valgrind warnings. Check the classic view.
- [ ] Quick-build or Quick-functional is not used.
- [ ] Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.
Bug-tracker data: Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Progress' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13661/1/testReport/
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13661/2/display/redirect
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13661/2/display/redirect
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/3/execution/node/1003/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/3/execution/node/1054/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/5/execution/node/1094/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/5/execution/node/1116/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/6/execution/node/1010/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/6/execution/node/1061/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/8/execution/node/1011/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/8/execution/node/1113/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/9/execution/node/1010/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/9/execution/node/1113/log
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13661/10/testReport/
Bug-tracker data: Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Progress' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13661/11/testReport/
Bug-tracker data: Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Progress' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13661/12/testReport/
Example of the TelemetryUtils.verify_data() method output w/ failures:
2024-02-17 22:02:05,708 test L0476 INFO | ==> Step 7: Verifying collected metric data
2024-02-17 22:02:05,756 telemetry_utils L1134 INFO | --------------------------------------------------------------------------------
2024-02-17 22:02:05,756 telemetry_utils L1135 INFO | Telemetry Metric Verification
2024-02-17 22:02:05,756 telemetry_utils L1136 INFO | Metric Host Rank Target Value Check
2024-02-17 22:02:05,756 telemetry_utils L1137 INFO | --------------------------------- -------------- ----- ---------- ------------------ --------------------------
2024-02-17 22:02:05,756 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,756 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_max wolf-[122-123] [0-1] [0-7] 2 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_max wolf-[122-123] [0-1] 1024 1 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 0 0.5183823529411765 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 1 0.5328467153284672 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-[122-123] [0-1] 1024 0.5 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 2 0.5075757575757576 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-[122-123] [0-1] [2-3] 0.5078740157480315 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 4 0.545774647887324 Pass
2024-02-17 22:02:05,757 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 5 0.5289256198347108 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 6 0.5458015267175572 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-122 0 7 0.5113636363636364 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 0 0.524390243902439 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 1 0.5436507936507936 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 3 0.5113207547169811 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 4 0.5181159420289855 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 5 0.5107913669064749 Pass
2024-02-17 22:02:05,758 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 6 0.504 Pass
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_mean wolf-123 1 7 0.5153846153846153 Pass
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_min wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 0 0.518206598003838 Fail (<0.5222329678670935)
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 1 0.5313030429466798 Pass
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-[122-123] [0-1] 1024 0.5006230532012171 Fail (<0.5007369200756457)
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 2 0.5079439094257994 Fail (<0.5222329678670935)
2024-02-17 22:02:05,759 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-[122-123] [0-1] [2-3] 0.5082524181799968 Fail (<0.5222329678670935)
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 4 0.5423995597898225 Pass
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 5 0.5278869972131698 Pass
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 6 0.5424622769462505 Pass
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-122 0 7 0.5115957439996861 Fail (<0.5222329678670935)
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 0 0.5237871589959653 Pass
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 1 0.540671261122452 Pass
2024-02-17 22:02:05,760 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 3 0.5115528741456628 Fail (<0.5144957554275266)
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 4 0.5179515014534789 Fail (<0.5222329678670935)
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 5 0.51102308933672 Fail (<0.5222329678670935)
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 6 0.5044729784358563 Fail (<0.5222329678670935)
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_qd_stddev wolf-123 1 7 0.5154210039604045 Fail (<0.5222329678670935)
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz wolf-[122-123] [0-1] [0-7] 470 Fail (<[916,1594])
2024-02-17 22:02:05,761 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz wolf-[122-123] [0-1] 1024 520 Pass
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_max wolf-[122-123] [0-1] [0-7,1024] 67130 Pass
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_max wolf-122 0 7 137912 Pass
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_max wolf-123 1 3 141228 Pass
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 0 941.0955882352941 Fail (<12190.666666666666)
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 1 941.5474452554745 Fail (<12190.666666666666)
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-[122-123] [0-1] 1024 1171.8905472636816 Fail (<1249.6823529411765)
2024-02-17 22:02:05,762 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 2 947.1666666666666 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 3 961.7874015748032 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 4 928.669014084507 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 5 971.3553719008264 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 6 954.7480916030535 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-122 0 7 1459.9015151515152 Fail (<12190.666666666666)
2024-02-17 22:02:05,763 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 0 966.5121951219512 Fail (<12190.666666666666)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 1 961.3095238095239 Fail (<12190.666666666666)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 2 958.9842519685039 Fail (<12190.666666666666)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 3 1470.3622641509435 Fail (<23972.666666666668)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 4 934.1159420289855 Fail (<12190.666666666666)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 5 934.7482014388489 Fail (<12190.666666666666)
2024-02-17 22:02:05,764 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 6 960.976 Fail (<12190.666666666666)
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_mean wolf-123 1 7 951.6384615384616 Fail (<12190.666666666666)
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_min wolf-[122-123] [0-1] [0-7] 50 Pass
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_min wolf-[122-123] [0-1] 1024 210 Pass
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 0 4042.1282899730422 Fail (<26929.58954508343)
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 1 4027.3426244804177 Fail (<26929.58954508343)
2024-02-17 22:02:05,765 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-[122-123] [0-1] 1024 4708.72669497394 Fail (<5115.057370010947)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 2 4102.5244684147465 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 3 4182.151240462439 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 4 3956.1130854899598 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 5 4284.142895167694 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 6 4117.7233291257235 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-122 0 7 9374.181423414268 Fail (<26929.58954508343)
2024-02-17 22:02:05,766 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 0 4249.2579767427505 Fail (<26929.58954508343)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 1 4198.5923441955665 Fail (<26929.58954508343)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 2 4182.225792186036 Fail (<26929.58954508343)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 3 9540.069195881493 Fail (<49106.89050428667)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 4 4012.655472735256 Fail (<26929.58954508343)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 5 3998.425154050264 Fail (<26929.58954508343)
2024-02-17 22:02:05,767 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 6 4215.267009248767 Fail (<26929.58954508343)
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_sz_stddev wolf-123 1 7 4133.58316522641 Fail (<26929.58954508343)
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_waiters wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_waiters_max wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_waiters_mean wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_waiters_min wolf-[122-123] [0-1] [0-7,1024] 0 Pass
2024-02-17 22:02:05,768 telemetry_utils L1142 INFO | engine_dmabuff_wal_waiters_stddev wolf-[122-123] [0-1] [0-7,1024] 0 Pass
Bug-tracker data: Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Progress' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/13/execution/node/1014/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/13/execution/node/1030/log
Bug-tracker data: Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Progress' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13661/14/testReport/
Ticket title is 'Add tests for new md_on_ssd metrics' Status is 'In Review' Labels: 'md_on_ssd,test_2.6' https://daosio.atlassian.net/browse/DAOS-11626
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/15/execution/node/781/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/15/execution/node/993/log
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/15/execution/node/1085/log
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13661/16/execution/node/781/log