Improve regression detection
Currently regressions are determined by comparing the previous build with the current build, and highlighting any tests that have passed but were failed in the previous build.
Sometimes a test does not run in a build for a variety of reasons, which leaves its status as "n/a". In this case, even though a test may have regressed, it will not be listed as a regression. An example can be seen here
Other times, a bad baseline is used, which can hide errors. For example, let's say a release is tested (the desired baseline), and then a release candidate is pushed and tested, and then a very small change is made to the release candidate and it is pushed once more. Any failures introduced in the initial RC will not be shown in the subsequent RC, because it will use the previous build as a baseline, rather than the previous release.
To improve this situation, a test should be able to say "failing for N builds", so that it can be evaluated accordingly.
Lastly, some tests intermittently fail. This could be detected by looking through the individual test history of a failing test and determining if it is frequently failing and then passing again.
Then, failing tests could clearly show:
- New regression (i.e. this test was passing and it has now failed for the first time)
- Failing for N builds/Y days (i.e. this test has been failing for some period of time)
- Failing Intermittently (i.e. this test sometimes fails, and sometimes passes)
- Never passed
This mirrors the analysis that we currently do manually on all failures reported in qa-reports.