hypothesis Too-slow health check doesn't say what was too slow

From https://travis-ci.org/LeastAuthority/txkube/builds/194934837

Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/txkube/test/test_model.py", line 94, in test_roundtrip
    def test_roundtrip(self, obj):
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/hypothesis/core.py", line 438, in wrapped_test
    HealthCheck.too_slow,
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/hypothesis/core.py", line 306, in fail_health_check
    raise FailedHealthCheck(message)
hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 9 valid examples in 1.13 seconds (0 invalid ones and 1 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.average_size or max_leaves parameters).
See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

Fortunately there's only a single strategy used by the test that failed here so (by reading the source) it's easy to find out what it was. If there had been multiple strategies, it would have been more of a pain to track down.

Jan 25 '17 12:01 exarkun

Yeah, there needs to be better tools for this, but this isn't a simple matter of fixing the error message so it won't happen without at least a moderate amount of work.

From Hypothesis's point of view there's only a single strategy - one that generates all the arguments for the function.

Jan 25 '17 12:01 DRMacIver

Generalizing somewhat, even if there is only a single strategy, if it combines others (eg with builds) then it could be that one sub-strategy is slow and the rest are fine. It seems like the health check could only report the overall strategy as being a problem, though, and thus the debugging challenge would remain rather significant.

A possible solution could be to instrument the included strategies so that their performance can be measured individually (eg have each log a start and stop event with a timestamp). This would also make it more reasonable to completely turn off the too-slow health check because one could continue to monitor the timing information independently and take action when the strategies are found to be too slow. For example, imagine that data feeding into a monitoring system with graphing and alerts for when the performance dips below some threshold...

Jan 25 '17 12:01 exarkun

A possible solution could be to instrument the included strategies so that their performance can be measured individually (eg have each log a start and stop event with a timestamp).

Yeah, that's more or less the solution I had in mind too. There's no technical difficulty doing it, it just hasn't been done because it's not been a priority for me.

Jan 25 '17 12:01 DRMacIver

I just spent a day or so playing with Hypothesis for Django, and this was fatal for me. I don't want to spend my days debugging my strategies. For a while it was rejecting on model validation and I at least knew where to put a breakpoint. Now it's apparently rejecting examples for some other reason, and I think I'd have to read your entire code base to figure out where and why.

Oct 01 '20 22:10 rainhead