Smart Test Retry with Failure Pattern Detection
What's the problem this feature will solve?
Flaky tests are a major pain point in modern CI/CD pipelines. Currently, pytest (and its ecosystem) supports retries via plugins like pytest-rerunfailures, but these retries are static, they simply rerun failed tests a fixed number of times without any analysis or adaptation.
This causes unnecessary CI time consumption and makes it difficult for developers to distinguish between:
- genuine regressions,
- transient or environment-related issues (network, timeout, race condition),
- and inherently flaky test logic.
In other words, pytest lacks a built-in way to learn from historical test behavior or adjust retry strategies dynamically based on failure patterns.
Describe the solution you'd like
A Smart Retry System that intelligently manages flaky tests and adapts retry behavior using failure pattern detection.
Example:
@pytest.mark.smart_retry(max_attempts=3, analyze=True)
def test_api_call():
# This test might fail intermittently due to network timeouts
...
Core behaviors:
-
Failure Categorization: Analyze stack traces and exception types to classify failures (e.g., network, timeout, assertion, resource contention).
-
Retry Strategy Tuning: Automatically adjust how retries happen: - Exponential backoff for timeouts - No retry for deterministic assertion failures - Limited retries for transient network issues
-
Failure History Tracking: Store test outcomes and failure categories in a lightweight local DB (SQLite or JSON).
-
Flakiness Metrics: Compute per-test flakiness rates across runs (e.g., "test_api_call failed 2/10 times in the last 5 builds").
Alternative Solutions
- pytest-rerunfailures: Provides basic retry capability but does not classify or analyze failures, and treats all failures equally.
- pytest-flakefinder: Helps detect flaky tests but reruns tests multiple times blindly and doesn’t adapt future
- CI-level reruns: Often implemented in GitHub Actions or Jenkins, but lack any integration with test logic or test metadata.
While these workarounds help mitigate flakes temporarily, they do not address the underlying need for failure intelligence and adaptive retries.
Additional context
This proposal can begin as a standalone plugin, e.g. pytest-smart-retry, built using pytest’s hook system (pytest_runtest_logreport, pytest_sessionfinish, etc.). Once mature, and if it gains community adoption, it could potentially be integrated into core pytest or become an officially recommended plugin. Would love to contribute if given a green signal
This is out of scope
There shoul be a dedicated plugin for flaky tests
I consider consistently flaky tests a red flag not something a core tests framework shoul extensively support
Totally fair point @RonnyPfannschmidt consistently flaky tests are indeed a red flag and usually indicate deeper reliability issues that need fixing, not just rerunning. That said, I actually built a dedicated plugin called pytest-smart-rerun precisely to handle this responsibly.
It’s not meant to normalize flakiness, it’s meant to help teams detect, classify, and manage transient failures while they work on root causes. The plugin provides: • Configurable retry logic for clearly transient errors (like network or async timeouts) • Transparent JSON and CI summaries so nothing is hidden • Optional integration with Model Context Protocol (MCP) for AI-assisted failure analysis — turning recurring flakiness into actionable insights rather than noise
The goal is to reduce false negatives in CI pipelines without compromising test integrity. And by keeping this functionality in a plugin rather than Pytest core, it stays true to Pytest’s philosophy of minimal core + powerful extensibility.
as long as there are clear and measurable indications its a very different story - a key is always that there is a indication of flakyness/corrective retries
this may be aided by eventually providing a mechanism for reporting "poisoning of higher scope fixtures si the cases where the flakeyness/failure needs a new instance of a fixture it can safely be provided
also something that core should provide is for pluigns to provide a way to re-sheduele a test for later
we need a feature proposal document for the parts that core needs vs the parts the plugin needs
btw - im floating the idea of a general pytest mcp for a while now that may need a separate plugin and extra hooks
That’s an excellent direction @RonnyPfannschmidt , I completely agree. We can treat this in two clear layers:
(1) Plugin Scope (pytest-smart-rerun)
- Implements smart retry + flakiness detection
- Tracks failure categories and retry metadata
- Optionally integrates with MCP (Model Context Protocol) for intelligent post-run analysis
(2) Core Scope (Future Hooks / Support)
- Provide a standard way for plugins to reschedule tests dynamically (as you mentioned)
- Expose structured metadata about fixture poisoning or teardown errors, so plugins can react properly
- Possibly define a shared schema for “retry reasons” or “flakiness indicators” that plugins can report
I can start by drafting a short Feature Proposal Doc outlining:
- Which parts belong in the plugin vs core
- Example API for a pytest_reschedule_test hook or similar
- Data contracts for reporting flakiness indicators
Once you’re okay with that structure, I’ll open a PR with the plugin + reference doc.
This may need some more design brainstorming
Id love to see sidechannel data in test reports that request reruns and store flakyness indication
The structure for that needs some experimentation
The fixture side is very unclear and possibly dependent on major refactoring