trafficserver autest command errors should log the failing command

When autest runs a command, there is typically a return code check. Per the failure I saw in #9770, the test error message looks like this:

     Setting up : Copying '/home/jenkins/workspace/Github_Builds/autest/src/tests/gold_tests/pluginTest/polite_hook_wait/.libs/polite_hook_wait.so' to '/tmp/sandbox/polite_hook_wait/ts/plugin' - Passed
     Test : Checking that ReturnCode == 0 - Failed
        Reason: Returned Value -9 != 0
     Time-Out : Process finishes within expected time - Passed
        Reason: Returned value: 10.503296613693237 < 600.0

This doesn't tell you what failed, which makes it hard to figure out. At minimum, autest should log the command that exited with the unwanted return code. It would be even better if it gave you the source code line in the test of where that command was added.

Jun 05 '23 05:06 jpeach

cc @dragon512

Jun 05 '23 05:06 jpeach

it said the return code was not 0 it was -9.

The question here is what do you expect it to say? Ideally the logging of the stream in the sandbox has more data if the program reports something on the stdout/stderr or in some custom log file. What do you suggest?

Jun 06 '23 15:06 dragon512

So at a high level, the problem is that there's no clear way to understand what happened. Something ran and exited with -9 (or did it, because I'm just assuming that's an exit status). Looking at this output, I believe that there is some operation associated with the ReturnCode check. It would be much clearer if the operation itself was shown.

This was in the context of https://github.com/apache/trafficserver/pull/9770, and looking at the test file, there are a number of checks for a zero return code, but the test framework output doesn't help me to understand which one failed.

It's not even totally clear to me whether this return code error is the result of the curl exit status. As you point out, the test framework shows that the value was "-9", and curl never returns that status AFAIK.

Jun 07 '23 06:06 jpeach

So in the report, I understand what you mean. The shell gives a value of -9 is not useful. However, if you added a --verbose and or --trace to the test run you will see a lot more detail about what happened as the test ran.

I feel in cases like this it would be nice to have some sort of HTML report that could be looked at to make it easy to go and see data better as it can hide noise unless the user clicks some UI element to expand and see the noise ( as in this case it should be useful).

I do want to address the ability to have more than one report type to help address this issue. however given what exists, I would run the test with --verbose and or --trace to get more detail and look at the sandbox output logs to see any other details. I am guessing this is a bash shell code that I believe we have seen in cases of running docker and a CI system, something to do with process handling being zombies. If this the case we have a setting to ignore certain return code globally.

Still, I agree I would like to do something better here.

Jun 07 '23 15:06 dragon512

In this case (a command failed?), can we show an actionable error instead of the return code? Like which command failed, the error output, or something?

Jun 08 '23 04:06 jpeach

https://ci.trafficserver.apache.org/job/Github_Builds/job/autest/8237/consoleFull is another example:

     Test : Checking that ReturnCode == 0 - Failed
        Reason: Returned Value -9 != 0

Not sure whether I have access, or whether having access will let me get at the detailed logs that contain the error diagnostics. But it would be super helpful if the command that gave the bad return code was logged with the code.

Jun 12 '23 06:06 jpeach

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

Jun 12 '24 01:06 github-actions[bot]