Report timeouts with a TIMEOUT status instead of FAILED
I am the author of PR qtile/qtile#2331 and want to add pytest-timeout to the qtile test suite.
The change I made to the conftest.py was to implement a TIMEOUT status plus inline timings to make it more obvious which tests fail and which ones time out. Two of the maintainers commented that they'd rather prefer it to have this functionality built into pytest-timeout and I agreed. So my shot at it is here and we'd love to get some feedback if something like that could be added to pytest-timeout.
Example output, luckily without `TIMEOUT` or `FAILED (...)`:
test/test_bar.py::test_basic PASSED (0.001) [ 2%] test/test_bar.py::test_basic PASSED (0.116) [ 2%] test/test_bar.py::test_basic PASSED (0.036) [ 2%] test/test_bar.py::test_singlespacer PASSED (0.001) [ 2%] test/test_bar.py::test_singlespacer PASSED (0.114) [ 2%] test/test_bar.py::test_singlespacer PASSED (0.035) [ 2%] test/test_bar.py::test_nospacer PASSED (0.001) [ 2%] test/test_bar.py::test_nospacer PASSED (0.115) [ 2%] test/test_bar.py::test_nospacer PASSED (0.034) [ 2%] test/test_bar.py::test_configure_broken_widgets PASSED (0.001) [ 2%] test/test_bar.py::test_configure_broken_widgets PASSED (0.112) [ 2%] test/test_bar.py::test_configure_broken_widgets PASSED (0.039) [ 2%] test/test_bar.py::test_bar_hide_show_with_margin PASSED (0.001) [ 2%] test/test_bar.py::test_bar_hide_show_with_margin PASSED (0.207) [ 2%] test/test_bar.py::test_bar_hide_show_with_margin PASSED (0.035) [ 2%] test/test_check.py::test_check_default_config SKIPPED (needs mypy) [ 3%] test/test_check.py::test_check_default_config PASSED (0.000) [ 3%] test/test_check.py::test_check_bad_syntax SKIPPED (needs mypy) [ 3%] test/test_check.py::test_check_bad_syntax PASSED (0.000) [ 3%]
A custom status could be nice. Probably should have a better mechanism to convey the test status instead of the current grepping for Timeout though, which you could do if you did this in timeout itself.
What other implications does introducing a new status have? You mention something about durations? How are other ways we could be accidentally break test suites by introducing this? Would the suite still fail if there's a timeout?
A custom status could be nice. Probably should have a better mechanism to convey the test status instead of the current grepping for
Timeoutthough, which you could do if you did this in timeout itself.
The up-to-date and original shots I had print TIMEOUT or T, respectively.
What other implications does introducing a new status have?
I am not sure if that has the capability to break something, but I don't think so. The category returned in case of a timeout is failed, so everything should still work.
You mention something about durations?
Yep, I got confused with the call_only option. We probably want a timeout for the complete test including setup and teardown. But when a timeout occurred and the duration of the test got printed, it was the call duration only which was smaller than the timeout. That was confusing. I got this right now but this has increased the complexity a bit. All the timings are now printed as a sum:
test/layouts/test_common.py::test_remove_floating[Stack] PASSED (0.111+1.051+0.041=1.203) [ 77%]
test/layouts/test_common.py::test_remove_floating[Tile] PASSED (0.112+1.052+0.039=1.203) [ 77%]
test/layouts/test_common.py::test_remove_floating[TreeTab] PASSED (0.119+1.136+0.052=1.308) [ 77%]
test/layouts/test_common.py::test_remove_floating[VerticalTile] PASSED (0.114+1.112+0.040=1.266) [ 77%]
test/layouts/test_common.py::test_remove_floating[Zoomy] PASSED (0.110+1.037+0.040=1.187) [ 78%]
test/layouts/test_common.py::test_desktop_notifications[Bsp] PASSED (0.110+1.171+0.039=1.321) [ 78%]
test/layouts/test_common.py::test_desktop_notifications[Columns] PASSED (0.111+1.329+0.041=1.482) [ 78%]
test/layouts/test_common.py::test_desktop_notifications[Floating] PASSED (0.110+1.243+0.040=1.393) [ 78%]
test/layouts/test_common.py::test_desktop_notifications[Matrix] PASSED (0.110+1.286+0.040=1.436) [ 78%]
test/layouts/test_common.py::test_desktop_notifications[Max] PASSED (0.112+1.186+0.038=1.336) [ 79%]
test/layouts/test_common.py::test_desktop_notifications[MonadTall] PASSED (0.111+1.172+0.040=1.323) [ 79%]
Looks a lot nicer in a proper terminal of course, this is the output from our CI setup.
How are other ways we could be accidentally break test suites by introducing this?
This could of course break test suites if exceptions occur when running those functions. I tried to make everything as exception-proof as in _pytest.runner, so checking with hasattr etc. should already be enough to avoid errors.
- I am pretty confident that
pytest_runtest_makereportcannot fail -
pytest_report_teststatusis close to the_pytest.runneroriginal and could only fail if-
pytestoverrides the customtimingsattribute. This could be prevented by choosing a more explicit name, e.g._timeout_plugin_timingsor so. -
report.durationis set and is not a number (does not support addition) for at least one report -
reporthas no attributetimed_out(which should always be set bypytest_runtest_makereport. This one can easily be fixed withgetattr, I'll do this later. The links in this section right here are permalinks, so check out the latest changes to that branch yourself) -
reportis missingpassed,failedorskipped, e.g. it isn't aTestReportinstance. That would break the_pytest.runnerimplementation, too.
-
-
pytest_runtest_protocolcould only fail if-
reports = _pytest.runner.runtestprotocol(...)(internal stuff is bad, I know, but this sadly cannot be avoided when printing the timings after all phases have been run) returns something that isbool(reports) = Truebut not iterable. The return annotation of that internal function is-> List[TestReport]. -
reportis missingpassed,failedorskipped, e.g. it isn't aTestReportinstance. That would break the_pytest.runnerimplementation, too.
-
All of those possible issues are either breaking _pytest itself, are highly unlikely (e.g. why would pytest set a report.duration = tuple() or something else that cannot be added together?) or easy to spot (runtestprotocol changes its return type).
When printing the timing information is too verbose or too far off from the current pytest behaviour, this can of course be removed. In that case pytest_runtest_protocol would not be required anymore and the internal _pytest.runner.runtestprotocol dependency would be gone.
Inline timings are definitely a nice-to-have because when running pytest -x --durations=0 and a timeout occurs, the duration statistics might not be printed (method=thread). Also, the --durations are separated by phases, so up to three different lines for the same test function.
Would the suite still fail if there's a timeout?
Yes they do still fail. At least I haven't seen a test suite that timed out and got an overall PASS.
I threw this at one of my personal projects and forced some timeouts:
...
test/unit_systems/abstract/test_type_utils.py::test_value_matches_annotation PASSED (0+0+0=0) [ 46%]
test/unit_systems/abstract/test_type_utils.py::test_overload PASSED (0+0.005+0=0.005) [ 53%]
test/unit_systems/abstract/test_type_utils.py::test_typeguard TIMEOUT (0+5.000=5.001) [ 60%]
test/unit_systems/abstract/test_type_utils.py::test_format_a_kw TIMEOUT (0+4.999+0=5.000) [ 66%]
test/unit_systems/si/test_si.py::test_unit_system FAILED (0+0.069+0=0.069) [ 73%]
test/unit_systems/si/test_si.py::test_base_units FAILED (0+0.003+0=0.004) [ 80%]
test/unit_systems/si/test_si.py::test_derived_units FAILED (0+0.005+0=0.006) [ 86%]
test/unit_systems/si/test_si.py::test_prefixes FAILED (0+0.002+0=0.003) [ 93%]
test/unit_systems/si/test_si.py::test_number_prefix_product FAILED (0+0.005+0=0.005) [100%]
The first timeout occurs during the call phase (s+c=total) and everything works out fine. The second one occurred during the teardown phase (s+c+t=total). Here things start to become wobbly:
traceback
______________________________________________________________________________________ test_unit_system _______________________________________________________________________________________ cls =All the other failed tests have the exact same traceback., func = . at 0x7f9a6e854ee0>, when = 'teardown' reraise = ( , ) @classmethod def from_call( cls, func: "Callable[[], TResult]", when: "Literal['collect', 'setup', 'call', 'teardown']", reraise: Optional[ Union[Type[BaseException], Tuple[Type[BaseException], ...]] ] = None, ) -> "CallInfo[TResult]": excinfo = None start = timing.time() precise_start = timing.perf_counter() try: > result: Optional[TResult] = func() .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:311: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:255: in lambda: ihook(item=item, **kwds), when=when, reraise=reraise .tox/py39/lib/python3.9/site-packages/pluggy/hooks.py:286: in __call__ return self._hookexec(self, self.get_hookimpls(), kwargs) .tox/py39/lib/python3.9/site-packages/pluggy/manager.py:93: in _hookexec return self._inner_hookexec(hook, methods, kwargs) .tox/py39/lib/python3.9/site-packages/pluggy/manager.py:84: in self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall( .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:175: in pytest_runtest_teardown item.session._setupstate.teardown_exact(item, nextitem) .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:419: in teardown_exact self._teardown_towards(needed_collectors) .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:434: in _teardown_towards raise exc .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:427: in _teardown_towards self._pop_and_teardown() .tox/py39/lib/python3.9/site-packages/_pytest/runner.py:387: in _pop_and_teardown self._teardown_with_finalization(colitem) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <_pytest.runner.setupstate object at>, colitem = def _teardown_with_finalization(self, colitem) -> None: self._callfinalizers(colitem) colitem.teardown() for colitem in self._finalizers: > assert colitem in self.stack E AssertionError
...
test/unit_systems/abstract/test_type_utils.py::test_value_matches_annotation PASSED (0+0+0=0) [ 46%]
test/unit_systems/abstract/test_type_utils.py::test_overload PASSED (0+0.011+0=0.011) [ 53%]
test/unit_systems/abstract/test_type_utils.py::test_typeguard TIMEOUT (0+5.000=5.000) [ 60%]
test/unit_systems/abstract/test_type_utils.py::test_format_a_kw PASSED (0+0+0=0) [ 66%]
test/unit_systems/si/test_si.py::test_unit_system PASSED (0+0.073+0=0.073) [ 73%]
test/unit_systems/si/test_si.py::test_base_units PASSED (0+0.003+0=0.003) [ 80%]
test/unit_systems/si/test_si.py::test_derived_units PASSED (0+0.006+0=0.006) [ 86%]
test/unit_systems/si/test_si.py::test_prefixes PASSED (0+0.002+0=0.002) [ 93%]
test/unit_systems/si/test_si.py::test_number_prefix_product PASSED (0+0.004+0=0.005) [100%]
With that single timeout, the test suite still fails:
=================================================================================== short test summary info ===================================================================================
TIMEOUT (0+5.000=5.000) test/unit_systems/abstract/test_type_utils.py::test_typeguard - Failed: Timeout >5.0s
================================================================================ 1 failed, 14 passed in 5.43s =================================================================================
ERROR: InvocationError for command ~/git/abstract_unit_systems/.tox/py39/bin/coverage run --source=unit_systems -m pytest -W error:::unit_systems (exited with code 1)
___________________________________________________________________________________________ summary ___________________________________________________________________________________________
ERROR: py39: commands failed
Note that the weird issue with the timeout happening during teardown came from a single lucky run. Other exceptions might or might not be thrown when repeatedly running them with sleeps.
I forked this repo to https://www.github.com/ep12/pytest-timeout/tree/timing_enhancements and added some commits to it that contain everything that would be helpful for qtile/qtile. Example output looks like
test/layouts/test_max.py::test_max_window_focus_cycle[MaxConfig-xephyr0] PASSED (899ms+3.92s+664ms=5.48s) [ 36%]
test/layouts/test_ratiotile.py::test_ratiotile_add_windows[RatioTileConfig-xephyr0] PASSED (909ms+6.29s+1.02s=8.23s) [ 36%]
test/layouts/test_ratiotile.py::test_ratiotile_add_windows_golden_ratio[RatioTileConfig-xephyr0] PASSED (916ms+6.24s+841ms=8.00s) [ 36%]
see https://github.com/ep12/qtile/tree/time_ci_runs_pytest_timeout_upstream for the options and check the github actions CI logs for more output.
Hello again! I just want to point you to my comment in the corresponding qtile issue message thread. It looks like the method both I implemented in 99edbf04 works. Would you like to verify that?
@flub or anyone else: Has anyone had a look at this? I think at least the first two commits (d8275117ef11001996d391d3ee66cf182d1dbd67 and b7f37f8286f4b114fb9384e834d44545eae218dd) would be nice to have. The second commit adds some functionality such that the third commit could even partially be realised in conftest.py. The first commit adds a new method that is a combination of thread and signal where the signal method is tried first and if the thread doesn't react in time, it is killed. That should theoretically combine the reliability of thread with the forgiveness of signal. That's just a proof of concept and not a complete PR of course.
What are your opinions regarding this all?
Hey, sorry I don't have a lot of time to spend on pytest-timeout so if no one else offers reviews I'm afraid my rather sporadic looks at pytest-timeout is all there is.
Anyway, having a quick look at those two commits they do indeed look interesting features. Would you like to open, preferably separate, PRs for them?
Nothing to apologise for, really! We all have a lot of stuff to do... I'll make those pr's, hopefully relatively soon... Can't promise it though... I'll probably have to look a bit more at the project structure to write the docs and tests if I can, so that might take a bit... When everything is ready for review I'll mention you so you get notified then.
This looks like what I want. I hope it to print TIMEOUT if it failed because of a timeout. so how is it going now?
Still in progress at the moment...