SmartSim icon indicating copy to clipboard operation
SmartSim copied to clipboard

Unfalsifiable tests in ``tests/test_interupt.py``

Open MattToast opened this issue 2 years ago • 0 comments

Description

SmartSim has tests in the tests/test_interupt.py test module that test that SmartSim is able to stop any jobs it may have launched in the event that it receives a SIGTERM or other fatal signals that can be handled (e.g. not SIGKILL).

SmartSim does this by registering a custom signal handler when the Controller.start method is called (which is called in turn when a user calls Experiment.start), which overides the default signal handler that raises a KeyboardInterrupt error; however, the test work by assuming that a KeyboardInterrupt exception will be thrown and puts all of the code in an except clause handling the error.

But of course no KeyboardInterrupt is thrown because the signal handler that throws the exception has been overwritten, thus effectively placing the part of the test that evaluates the code in a dead block. This means that these tests are more or less impossible to fail (assuming nothing breaks during launching).

How to reproduce

Add these lines to the evaluating code in tests/test_interupt.py:

--- a/tests/test_interrupt.py
+++ b/tests/test_interrupt.py
@@ -69,6 +69,7 @@ def test_interrupt_blocked_jobs(make_test_dir):
         keyboard_interrupt_thread.start()
         exp.start(model, ensemble, block=True, kill_on_interrupt=True)
     except KeyboardInterrupt:
+        assert False
         time.sleep(2)  # allow time for jobs to be stopped
         active_jobs = exp._control._jobs.jobs
         active_db_jobs = exp._control._jobs.db_jobs
@@ -112,6 +113,7 @@ def test_interrupt_multi_experiment_unblocked_jobs(make_test_dir):
             experiment.start(model, ensemble, block=False, kill_on_interrupt=True)
         time.sleep(9)  # since jobs aren't blocked, wait for SIGINT
     except KeyboardInterrupt:
+        assert False
         time.sleep(2)  # allow time for jobs to be stopped
         for i, experiment in enumerate(experiments):
             active_jobs = experiment._control._jobs.jobs

These assert Falses should always raise an error when the test is ran. However, you will notice that test will pass without issue

$ pytest tests/test_interrupt.py
# ... output abridged for readability ...
tests/test_interrupt.py::test_interrupt_blocked_jobs PASSED                       [ 50%]
tests/test_interrupt.py::test_interrupt_multi_experiment_unblocked_jobs PASSED    [100%]
# ... more abridging ...
$ echo $?
0

Expected behavior

Manditory:

  • [x] These tests should have their evaluation code placed in a block that is reachable. If an assert False is placed in that block, it should always raise and the test should fail.

Optional (depending on desired scope of this issue):

  • [ ] There is an argument to be made that the Controller should restore the default signal handlers after all of the launched jobs have completed. This may require moving the registering of the signal handler from the Controller to the JobManager where lifetimes of managed jobs are more easily tracked.
  • [x] There is also good argument to be made that SmartSim should raise a KeyboardInterrupt exception in the that a custom signal handler has been registered and and the interperter recieves a SIGTERM as that is likely to be more alligend with a user's expected behavior when writing/using a SmartSim driver script.

System

All

MattToast avatar Oct 23 '23 17:10 MattToast