`nf` claims to have delivered `SIGINT` to all children on exit from one, but does not actually
The README says:
If your processes exit, Node Foreman will assume an error has occurred and shut your application down.
nf does seem to detect the exit of a single child process, and claims to be sending a SIGINT to all children in response to it, but in fact will not deliver the SIGINT in all cases.
Here's a simple repro case:
❯ cat wait-for-sigint.sh
#!/bin/bash
function handle_sigint {
echo "got SIGINT, exiting ..."
exit
}
trap handle_sigint SIGINT
echo "started, sleeping forever awaiting SIGINT"
sleep 1000000
❯ cat Procfile
a: sleep 10 && exit 1
b: ./wait-for-sigint.sh
❯ nf start
12:38:47 PM b.1 | started, sleeping forever awaiting SIGINT
[DONE] Killing all processes with signal SIGINT
12:38:57 PM a.1 Exited with exit code null
< ... `nf` does not actually exit here, nor doe the `b` child process running `wait-for-sigint.sh` ... >
Observations
If I modify wait-for-sigint.sh to emit a constant stream of output while it is waiting, then the test case works as expected:
❯ cat Procfile
a: sleep 5 && exit 1
b: ./wait-for-sigint-with-output.sh
❯ cat wait-for-sigint-with-output.sh
#!/bin/bash
function handle_sigint {
echo "got SIGINT, exiting ..."
exit
}
trap handle_sigint SIGINT
echo "started, sleeping forever awaiting SIGINT"
while true
do
echo 'still here'
sleep 1
done
❯ nf start
12:43:41 PM b.1 | started, sleeping forever awaiting SIGINT
12:43:41 PM b.1 | still here
12:43:42 PM b.1 | still here
12:43:43 PM b.1 | still here
12:43:44 PM b.1 | still here
12:43:45 PM b.1 | still here
[DONE] Killing all processes with signal SIGINT
12:43:45 PM a.1 Exited with exit code null
12:43:46 PM b.1 | got SIGINT, exiting ...
12:43:46 PM b.1 Exited Successfully
Comparison to other implementations
foreman (Ruby)
❯ foreman start
12:47:50 a.1 | started with pid 59856
12:47:50 b.1 | started with pid 59857
12:47:50 b.1 | started, sleeping forever awaiting SIGINT
12:47:55 a.1 | exited with code 1
12:47:55 system | sending SIGTERM to all processes
12:47:56 b.1 | terminated by SIGTERM
goreman (Go)
goreman has different default behavior wrt a single child process exiting:
❯ goreman start
12:45:24 a | Starting a on port 5000
12:45:24 b | Starting b on port 5100
12:45:24 b | started, sleeping forever awaiting SIGINT
12:45:29 a | Terminating a
... but with -exit-on-error ('Exit goreman if a subprocess quits with a nonzero return code'):
❯ goreman -exit-on-error start
12:46:10 a | Starting a on port 5000
12:46:10 b | Starting b on port 5100
12:46:10 b | started, sleeping forever awaiting SIGINT
12:46:15 a | Terminating a
12:46:15 b | got SIGINT, exiting ...
12:46:15 b | Terminating b
goreman: exit status 1
Turns out I had misdiagnosed this!
nf really was delivering SIGINT to all direct children, but because it doesn't use process groups for each spawned child, if the child processes spawned their own children and didn't respond to SIGINT by exiting or forwarding to their children, then nf would just hang when one child exited.
In the repro case that I gave, the process tree looks like this after a exits:
❯ pstree -s nf.js
... snip ...
\-+= 83622 ben node nf.js start
\-+- 83628 ben /bin/bash ./wait-for-sigint.sh
\--- 83632 ben sleep 1000000
The bash process (pid 83628) actually has received the SIGINT, but per the bash manual:
When Bash receives a signal for which a trap has been set while waiting for a command to complete, the trap will not be executed until the command completes.
So in this example:
-
aexited -
nfsentSIGINTto the direct child process forb(bash, pid=83628) -
bashgot theSIGINT, but was waiting to invoke the trap handler until thesleepcommand (pid=83632) exited - The
sleepcommand itself never received theSIGINT
The way that goreman solves this is by creating a process group for each spawned child, and then delivering the SIGINT signals to the group, rather than the direct child.
I've implemented support for using process groups in my fork (https://github.com/benweint/node-foreman/commit/5cb9ee5009772fce10eb1cafd9ffa00b7d780102) and can PR it if there's interest, but it looks like this project might be dead.