behavex icon indicating copy to clipboard operation
behavex copied to clipboard

Handle when process terminates unexpectedly

Open jbridger opened this issue 2 years ago • 0 comments

This is our attempt at fixing https://github.com/hrcorval/behavex/issues/114. We ran into issues where behavex would never terminate if a test segfaulted. This is our attempt to address this issue with the following changes by:

  • Eventually exiting with a non-zero exit status if a spawned process terminates unexpectedly
  • Outputting more information to help with debugging which test failed

We're not expert Python developers and explored several options before settling on this. Happy to have further discussions if necessary 🙂

Handling unexpected process termination

Switched to using the concurrent.futures.ProcessPoolExecutor which handles when process unexpectedly terminates. Whereas the previous process pool in Python does not handle this yet. When a process unexpectedly terminates, it causes all running and queued tasks to be cancelled. The pool can no longer be used to submit tasks to. Cancelled futures will all get the same BrokenProcessPool error, so it's not possible to know specifically which process terminated unexpectedly which would help the user debug the cause of the failure.

Due to the consequences of the above, if a BrokenProcessPool is encountered, we don't generate the end report or the statistics as the data will be incomplete. behavex will no longer wait for infinity and it will exit with a non-zero exit code.

More debug information

We wanted behavex to also output sufficient information to help the user narrow down which tests are the culprit. In order to this and with the limitations of what information we can get from using the Python process pool, we use a SyncManager list that can be shared with the process executing the test and the callback functions.

When execute_test task is run in a child process, it adds to the list that it is running a specific test. When the callback is called due to task completion, we remove the test from this list. When all the tests complete, this list should be empty. If it isn't, then it indicates a process terminated unexpectedly. This tells us which test failed to run.

All running and queued futures are cancelled and will trigger this callback. In the callback, we don't remove from this list if it is due to a BrokenProcessPool. By keeping this information, we know what tests were running at the time a process died. Unfortunately this list would include tests that were running that was not involved with a process that terminated unexpectedly. However we have at least made a smaller haystack to look for the needle in.

To further help with debugging, we changed the output directory for behave's stdout to the output folder and updated the path to match an ID that is associated with a test execution. In the test output, we can then provide the specific path to the behave logs for the failing tests. These behave logs are useful in the event of a segfault because behave logs the steps that were executed up till the point it failed.

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme scenario):

These scenarios failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Scenario name: Passing scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Scenario name: Segfaulting scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme feature):

These features failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

jbridger avatar Dec 04 '23 18:12 jbridger