Tests t-autocompact and t-corruption can be flaky on low-power systems

Open enter-github-username opened this issue 2 years ago • 0 comments

The two tests test/t-autocompact and test/t-corruption can fail unexpectedly on low-power systems (like e.g. a CI worker with restricted CPU resources or old hardware), as the hardcoded time of 1 second (1000000µs) may be exceeded. As the previous operation is not yet finished after sleeping via ldb_sleep_usec, the tests fail.

The log files show that assertions are violated:

iter   1 => 100.040 MB [other   0.000 MB]
iter   2 => 100.040 MB [other   0.000 MB]
iter   3 => 100.040 MB [other   0.000 MB]
iter   4 => 100.040 MB [other   0.000 MB]
iter   5 => 100.040 MB [other   0.000 MB]
iter   6 => 100.040 MB [other   0.000 MB]
iter   7 =>  59.008 MB [other   0.000 MB]
iter   8 =>  38.492 MB [other   0.000 MB]
iter   9 =>  17.976 MB [other   0.000 MB]
iter  10 =>   4.103 MB [other   0.000 MB]
iter   1 =>  50.017 MB [other  50.023 MB]
iter   2 =>  50.020 MB [other  50.021 MB]
iter   3 =>  50.020 MB [other  50.021 MB]
iter   4 =>  50.020 MB [other  50.021 MB]
iter   5 =>  50.020 MB [other  50.021 MB]
iter   6 =>  50.020 MB [other  50.021 MB]
iter   7 =>  29.504 MB [other  29.505 MB]
iter   8 =>   8.988 MB [other  29.505 MB]
iter   9 =>   4.103 MB [other   0.000 MB]
t-autocompact.c:193: Assertion `final_size >= other_size / 5 - 1048576' failed.
FAIL t-autocompact (exit status: 134)

Increasing the sleep time would reduce the number of systems affected - however, it would also increase the execution time of the test suite. Another solution would be to implement some kind of synchronsation mechanism to remove the need for a hardcoded time limit - I'm not sure whether or not that's worth it, though, that might be quite a bit of work. :-)

Mar 18 '23 13:03 enter-github-username