tests: parametrize bench mark tests
In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added. By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value
Changes
Parametrize the test by the list of criterion benchmarks.
By parametrizing the tests, git clone will executed for each parameter in here.
https://github.com/firecracker-microvm/firecracker/blob/70ac1544d3bb460c47d93132f59a9e35cd5ad3da/tests/framework/ab_test.py#L85-L98
To run all parametrized tests with single git close would require major revisions to git_ab_test, so this PR does not address that issue.
Reason
close #4832
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.
PR Checklist
- [x] I have read and understand CONTRIBUTING.md.
- [x] I have run
tools/devtool checkstyleto verify that the PR passes the automated style checks. - [x] I have described what is done in these changes, why they are needed, and how they are solving the problem in a clear and encompassing way.
- [ ] I have updated any relevant documentation (both in code and in the docs) in the PR.
- [ ] I have mentioned all user-facing changes in
CHANGELOG.md. - [x] If a specific issue led to this PR, this PR closes the issue.
- [ ] When making API changes, I have followed the Runbook for Firecracker API changes.
- [x] I have tested all new and changed functionalities in unit tests and/or integration tests.
- [ ] I have linked an issue to every new
TODO.
- [ ] This functionality cannot be added in
rust-vmm.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.06%. Comparing base (
ae078ee) to head (d1226ab). Report is 136 commits behind head on main.
:exclamation: Current head d1226ab differs from pull request most recent head 420a113
Please upload reports for the commit 420a113 to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #4974 +/- ##
=======================================
Coverage 83.06% 83.06%
=======================================
Files 250 250
Lines 26897 26897
=======================================
Hits 22342 22342
Misses 4555 4555
| Flag | Coverage Δ | |
|---|---|---|
| 5.10-c5n.metal | 83.56% <ø> (ø) |
|
| 5.10-m5n.metal | 83.56% <ø> (-0.01%) |
:arrow_down: |
| 5.10-m6a.metal | 82.79% <ø> (-0.01%) |
:arrow_down: |
| 5.10-m6g.metal | 79.34% <ø> (ø) |
|
| 5.10-m6i.metal | 83.55% <ø> (-0.01%) |
:arrow_down: |
| 5.10-m7a.metal-48xl | 82.77% <ø> (-0.01%) |
:arrow_down: |
| 5.10-m7g.metal | 79.34% <ø> (ø) |
|
| 5.10-m7i.metal-24xl | 83.51% <ø> (-0.01%) |
:arrow_down: |
| 5.10-m7i.metal-48xl | 83.52% <ø> (ø) |
|
| 5.10-m8g.metal-24xl | 79.34% <ø> (ø) |
|
| 5.10-m8g.metal-48xl | 79.34% <ø> (ø) |
|
| 6.1-c5n.metal | 83.61% <ø> (ø) |
|
| 6.1-m5n.metal | 83.61% <ø> (ø) |
|
| 6.1-m6a.metal | 82.84% <ø> (+<0.01%) |
:arrow_up: |
| 6.1-m6g.metal | 79.34% <ø> (ø) |
|
| 6.1-m6i.metal | 83.60% <ø> (+<0.01%) |
:arrow_up: |
| 6.1-m7a.metal-48xl | 82.82% <ø> (ø) |
|
| 6.1-m7g.metal | 79.34% <ø> (ø) |
|
| 6.1-m7i.metal-24xl | 83.61% <ø> (-0.01%) |
:arrow_down: |
| 6.1-m7i.metal-48xl | 83.62% <ø> (ø) |
|
| 6.1-m8g.metal-24xl | 79.34% <ø> (ø) |
|
| 6.1-m8g.metal-48xl | 79.34% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Mh, it would seem that it's still compiling everything multiple times judging by the test runtimes, and that's why its timing out? :thinking:
Mh, it would seem that it's still compiling everything multiple times judging by the test runtimes, and that's why its timing out? 🤔
@roypat Sorry for the late reply. After investigation, I found that the timeout threshold was too short. Let's take the following build failure log as an example.
https://buildkite.com/firecracker/firecracker-pr-optional/builds/11752#01967b0d-070f-4a47-ab7c-8aa74ec201ba
This log contains the following log:
419.83s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault]
178.50s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_16]
33.90s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_256]
31.28s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_pop_16]
31.24s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[request_parse]
30.96s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[next_descriptor_16]
24.40s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[deserialize_cpu_template]
20.43s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[serialize_cpu_template]
0.17s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault]
0.09s call integration_tests/functional/test_cpu_template_helper.py::test_guest_cpu_config_change
- First,
page_faultparametrized test is run. Since this is the first test run, a process to build a binary for benchmark tests. - These processes do not complete within 420 seconds, so the test fails and the binary build is interrupted.
- Next,
queue_add_used_16is run. - Since the binary has not yet been built, cargo will build binary, but since there is a build cache generated before the
page_faulttest run, this build completes relatively quickly. Thenqueue_add_used_16completes in 178.5s. - Next,
queue_add_used_256is run. Since there is a pre-built binary, this test completes quickly in about 30s. Subsequent tests are similar.
I was able to reproduce the same issue locally, so I increased the timeout threshold. Could you run CI again?
We decided to remove the automated criterion benchmark runs (#5273) as they've been causing more pain with flakiness than they've been useful. So this PR will no longer be relevant