flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

Convenience function needed for `wait` usage in sharness tests?

Open SteVwonder opened this issue 6 years ago • 0 comments

Continuation of the convo started in #2834. Starts with my late-night pontifications, skip to the end for "the good stuff".

I didn't realize that waits behavior is not standardized across shells. I guess I should have expected as much. Looks like for bash v5 on Ubuntu 19.10:

       wait [-fn] [id ...]
              Wait for each specified child process and return its termination status.  Each id may be a process ID or a job specification; if
              a job spec is given, all processes in that job's pipeline are waited for.  If id is not given, all currently active  child  pro‐
              cesses are waited for, and the return status is zero.  If the -n option is supplied, wait waits for any job to terminate and re‐
              turns its exit status.  If the -f option is supplied, and job control is enabled, wait forces id to terminate  before  returning
              its  status,  instead  of returning when it changes status.  If id specifies a non-existent process or job, the return status is
              127.  Otherwise, the return status is the exit status of the last process or job waited for.

And for dash on Ubuntu 19.10:

     wait [job]
            Wait for the specified job to complete and return the exit status of the last process in the job.  If the argument is omitted,
            wait for all jobs to complete and return an exit status of zero.

As @grondo / @garlick pointed out in #2834: you need to explicitly specify PIDs or you will always get an exit status of 0. Going a little further though, I was disturbed by the fact that dash's wait only has a single job argument. I was also disturbed by bash's comment the return status is the exit status of the last process or job waited for. So I threw together a quick test (hopefully this is testing what I think it is):

❯ dash -c 'sleep 10 && echo "Job 1" & pid1=$!; sleep 9 && echo "Job 2" && exit 2 & pid2=$!; wait $pid1 $pid2; echo $?'
Job 2
Job 1
2
                                                                                                                                                  
10s sherbein ~
❯ dash -c 'sleep 10 && echo "Job 1" & pid1=$!; sleep 9 && echo "Job 2" && exit 2 & pid2=$!; wait $pid2 $pid1; echo $?'
Job 2
Job 1
0

"The Good Stuff"

On the plus side, dash's wait does wait for multiple processes to complete. On the down side, it looks to me like dash's wait only reports the exit status of the last pid waited for (similar to bash). So with multiple jobs, failure exit status code can be eaten up by wait.

Assuming that is all correct, I wonder if we should create a convenience function in sharness.d that takes a list of PIDs, waits for each one individually, and exits with the largest exit code of them all.

SteVwonder avatar Mar 14 '20 05:03 SteVwonder