test: Add --repeat-until-n-failures flag to test runner
What is the problem this feature will solve?
Currently, debugging flaky tests requires either:
- Manually running tests repeatedly
- Using --repeat with a fixed number of iterations
- Writing custom test wrapper code
This makes it difficult to:
- Reliably reproduce failures
- Gather statistics about test flakiness
- Automate flaky test detection in CI
While V8's test runner has some flaky test handling capabilities, Node.js's main test runner lacks a built-in way to repeatedly run tests until a specific number of failures occur. The main use-case of course, would be to run until one failure is reached as opposed to having to repeatedly run tests hoping they fail, which even with the --repeated flag is rather tedious. A debugging flow I use is to run a test until a failure, then using a breakpoint at the failure, inspect the variables and call stack.
What is the feature you are proposing to solve the problem?
Add a new --repeat-until-n-failures flag to tools/test.py that allows tests to be repeated until a specified number of failures occur or a maximum iteration count is reached.
Proposed syntax: --repeat-until-n-failures=<n_failures>[,max_iterations]
Examples:
Run until 1 failure occurs
python tools/test.py --repeat-until-n-failures=1 test/path
Run until 5 failures or 1000 iterations
python tools/test.py --repeat-until-n-failures=5,1000 test/path
The feature would:
- Run the specified test(s) repeatedly
- Track the number of failures
- Stop when either:
- The specified number of failures is reached
- The maximum iteration count (if specified) is reached
- Report statistics about the failures (iteration counts, failure rate, etc.)
This would complement the existing --repeat and --exit-after-n-failures flags by providing a focused tool for debugging flaky tests and measuring test reliability.
I am happy to work on implementing this feature if there is interest.
What alternatives have you considered?
-
Using existing --repeat flag with a large number:
- Less efficient as it always runs all iterations
- Doesn't stop automatically on failure
- Harder to analyze results
-
Personal custom test script:
- More fragile, and less extensibility
- Maintenance burden as test runner evolves - need to update parsing logic with each change
- Isolated solution that doesn't benefit the Node.js ecosystem
@FelixVaughan are you planning to work on this, or else I can resolve this!
@0hmX yes! I left this up to get some initial discussion, but happy to start on an implementation
There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the https://github.com/nodejs/node/labels/never-stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document.