pytest Smart Test Retry with Failure Pattern Detection

What's the problem this feature will solve?

Flaky tests are a major pain point in modern CI/CD pipelines. Currently, pytest (and its ecosystem) supports retries via plugins like pytest-rerunfailures, but these retries are static, they simply rerun failed tests a fixed number of times without any analysis or adaptation.

This causes unnecessary CI time consumption and makes it difficult for developers to distinguish between:

genuine regressions,
transient or environment-related issues (network, timeout, race condition),
and inherently flaky test logic.

In other words, pytest lacks a built-in way to learn from historical test behavior or adjust retry strategies dynamically based on failure patterns.

Describe the solution you'd like

A Smart Retry System that intelligently manages flaky tests and adapts retry behavior using failure pattern detection.

Example:

@pytest.mark.smart_retry(max_attempts=3, analyze=True)
def test_api_call():
    # This test might fail intermittently due to network timeouts
    ...

Core behaviors:

Failure Categorization: Analyze stack traces and exception types to classify failures (e.g., network, timeout, assertion, resource contention).
Retry Strategy Tuning: Automatically adjust how retries happen: - Exponential backoff for timeouts - No retry for deterministic assertion failures - Limited retries for transient network issues
Failure History Tracking: Store test outcomes and failure categories in a lightweight local DB (SQLite or JSON).
Flakiness Metrics: Compute per-test flakiness rates across runs (e.g., "test_api_call failed 2/10 times in the last 5 builds").

Alternative Solutions

pytest-rerunfailures: Provides basic retry capability but does not classify or analyze failures, and treats all failures equally.
pytest-flakefinder: Helps detect flaky tests but reruns tests multiple times blindly and doesn’t adapt future
CI-level reruns: Often implemented in GitHub Actions or Jenkins, but lack any integration with test logic or test metadata.

While these workarounds help mitigate flakes temporarily, they do not address the underlying need for failure intelligence and adaptive retries.

Additional context

This proposal can begin as a standalone plugin, e.g. pytest-smart-retry, built using pytest’s hook system (pytest_runtest_logreport, pytest_sessionfinish, etc.). Once mature, and if it gains community adoption, it could potentially be integrated into core pytest or become an officially recommended plugin. Would love to contribute if given a green signal

Oct 11 '25 19:10 Aki-07

This is out of scope

There shoul be a dedicated plugin for flaky tests

I consider consistently flaky tests a red flag not something a core tests framework shoul extensively support

Oct 12 '25 06:10 RonnyPfannschmidt

Totally fair point @RonnyPfannschmidt consistently flaky tests are indeed a red flag and usually indicate deeper reliability issues that need fixing, not just rerunning. That said, I actually built a dedicated plugin called pytest-smart-rerun precisely to handle this responsibly.

It’s not meant to normalize flakiness, it’s meant to help teams detect, classify, and manage transient failures while they work on root causes. The plugin provides: • Configurable retry logic for clearly transient errors (like network or async timeouts) • Transparent JSON and CI summaries so nothing is hidden • Optional integration with Model Context Protocol (MCP) for AI-assisted failure analysis — turning recurring flakiness into actionable insights rather than noise

The goal is to reduce false negatives in CI pipelines without compromising test integrity. And by keeping this functionality in a plugin rather than Pytest core, it stays true to Pytest’s philosophy of minimal core + powerful extensibility.

Oct 12 '25 09:10 Aki-07

as long as there are clear and measurable indications its a very different story - a key is always that there is a indication of flakyness/corrective retries

this may be aided by eventually providing a mechanism for reporting "poisoning of higher scope fixtures si the cases where the flakeyness/failure needs a new instance of a fixture it can safely be provided

also something that core should provide is for pluigns to provide a way to re-sheduele a test for later

we need a feature proposal document for the parts that core needs vs the parts the plugin needs

btw - im floating the idea of a general pytest mcp for a while now that may need a separate plugin and extra hooks

Oct 12 '25 09:10 RonnyPfannschmidt

That’s an excellent direction @RonnyPfannschmidt , I completely agree. We can treat this in two clear layers:

(1) Plugin Scope (pytest-smart-rerun)

Implements smart retry + flakiness detection
Tracks failure categories and retry metadata
Optionally integrates with MCP (Model Context Protocol) for intelligent post-run analysis

(2) Core Scope (Future Hooks / Support)

Provide a standard way for plugins to reschedule tests dynamically (as you mentioned)
Expose structured metadata about fixture poisoning or teardown errors, so plugins can react properly
Possibly define a shared schema for “retry reasons” or “flakiness indicators” that plugins can report

I can start by drafting a short Feature Proposal Doc outlining:

Which parts belong in the plugin vs core
Example API for a pytest_reschedule_test hook or similar
Data contracts for reporting flakiness indicators

Once you’re okay with that structure, I’ll open a PR with the plugin + reference doc.

Oct 16 '25 15:10 Aki-07

This may need some more design brainstorming

Id love to see sidechannel data in test reports that request reruns and store flakyness indication

The structure for that needs some experimentation

The fixture side is very unclear and possibly dependent on major refactoring

Oct 16 '25 16:10 RonnyPfannschmidt