firecracker Parametrize `test_benchmarks.py` test by criterion benchmarks

Currently, test_benchmarks.py contains a single test that runs all criterion benchmarks. With increasing number of benchmarks, we thus increase the duration of this test, and need to adjust its timeout. Instead, we want to parameterize the test by the list of criterion benchmarks we have in the repository. This means introducing a fixture that yields the benchmarks listed by cargo bench --all -- --list individually. Then the test itself would only run the benchmark it is provided. This should also make it easier to see which benchmarks are failing.

Originally posted by @pb8o in https://github.com/firecracker-microvm/firecracker/pull/4830#discussion_r1784428511

Oct 02 '24 15:10 roypat

If someone has not done it yet, I would take it over.

Oct 18 '24 10:10 MacOS

Hey @MacOS, please start working on it if you want.

Oct 30 '24 14:10 bchalios

@MacOS CC: @bchalios

Are you still working on this? If you are busy, may I handle it instead you?

Dec 21 '24 05:12 cm-iwata

I would submit something next week, however, I am not mad if you do it.

Jan 03 '25 19:01 MacOS

@MacOS @cm-iwata @bchalios

I wanted to check and see if this issue was still being worked on. If not, would I be able to work on it?

Apr 10 '25 20:04 gjkeller

@gjkeller I'm waiting for a pull request review...

Apr 10 '25 21:04 cm-iwata

@cm-iwata No worries! I was unsure of if it was complete due to the failing tests, but seems like the failure was only due to timeouts and network errors. Good luck with getting your PR merged!

Apr 11 '25 02:04 gjkeller

@gjkeller I'm waiting for a pull request review...

Hi, sorry, I thought the latest update on the PR was that you were looking into why the CI was failing. Did you need help with anything? :o

Apr 23 '25 13:04 roypat

@roypat I have made some adjustments, so could you please re-run the CI and review it?

Apr 27 '25 09:04 cm-iwata

We decided to remove the automated criterion benchmark runs as they've been causing more pain with flakiness than they've been useful.

Jun 26 '25 09:06 Manciukic