install.sh exits with no explanation on command errors
The current version of install.sh runs "set -o errexit", which immediately exits the script if any internal command fails. On a fresh install of Ubuntu 18.04 we have a screenshot where the installer failed and did not show any error message (or the intended "trap '__err $LINENO' ERR" line number output. (Unable to attach, but the last two lines of output were: [-] Updating packages... SUCCESS xxxxxxxxxxxxx$ is installed... My best guess is that there was a temporary failure in installing one of the required packages. After manually running "sudo apt install curl" - which succeeded this time - we restarted the installer and it continued beyond the above message. Could we consider any of the following?
- Not using set -o errexit
- Testing the return codes of individual commands for failure and being explicit about where the problem is
- Not using "-qq" with apt (and respectively "-q" with yum) to hide all output 3a) Continue using the above flags, but if there's a non-zero return code from the command, rerun it without those flags so the user can see what the issue is.
Hmm, I recall seeing something similar to this a while ago, perhaps. I can look into this.
@william-stearns: This issue only appears upon a first run of the installer. Running the installer a second time does not cause this behaviour. I'm looking into the fix now.
[EDIT]: I think I figured out what's happening. My VM clone wasn't up to date, and apt was getting locked by the daily installer. Can you verify that is similar to what was happening on your end? Now that my VM was updated before cloning, the installer ran without incident.
Thanks!
Solution will likely be a check in the script to ensure that no PIDs exist for apt or dpkg. I'm still looking into it.
There's an additional discussion about the fact that we have very quiet output in general, commonly discarding error information to keep the customer's terminal clean. Ethan put it best: "1) installer output to console reduced and 2) debugging/troubleshooting output increased but to a log file. I really like it when an installer is succinct but when encounters an error gives a brief error message and points to the full log file output."
Another example - 2 hours on the phone trying to debug why an install failed. After clearing up some background customer issues and troubleshooting we finally came down to this line failing in the AI-Hunter install:
$SUDO apt-get -qq update > /dev/null 2>&1
Apt is told to not provide any output, and if it gave any, the shell destroyed it. There was no check to see if the command succeeded, other than the fact that the shell caught the return code. The customer - and we - are left with no reason for the failure; we have to track each one down by hand. I believe this is the 4th or 5th time I've had to take customer calls with the installer failing to provide enough data to diagnose problems. They've not all been the same command, so while it may help to resolve issues with individual commands, we need to stop deliberately destroying error output.
Recommended function to run commands with output saved to a shared log file. It saves the exact command that was run, saves stderr and stdout to the log file, and warns the user if the command returned a non-0 return code (giving the user a chance to say exit or continue anyways).
export log_to="/tmp/installer.date '+%Y%m%d%H%M%S'`.txt"
touch "$log_to"
#Note, askYN() is already loaded in acmlib.sh. It's only in here for independent testing. askYN () { # Prints a question mark, reads repeatedly until the user # repsonds with t/T/y/Y or f/F/n/N. TESTYN="" while [ "$TESTYN" != 'Y' ] && [ "$TESTYN" != 'N' ] ; do echo -n '? ' >&2 read -e TESTYN <&2 || : case $TESTYN in T*|t*|Y*|y*) TESTYN='Y' ;; F*|f*|N*|n*) TESTYN='N' ;; esac done
if [ "$TESTYN" = 'Y' ]; then
return 0 #True
else
return 1 #False
fi
}
#Note, fail() is already loaded in acmlib.sh. It's only in here for independent testing. fail () { # Displays the passed in error and asks the user if they'd like to continue # the script. Will exit with error code 1 if the user stops the script. echo "$*" >&2 echo "This is normally an unrecoverable problem, and we recommend fixing the problem and restarting the script. Please contact technical support for help in resolving the issue. If you feel the script should continue, enter Y and the script will attempt to finish. Entering N will cause this script to exit." >&2 if askYN ; then echo "Script will continue at user request. This may not result in a working configuration." >&2 sleep 5 else exit 1 fi }
log_output () { #Save the output from a command to a file. If the return code is non-zero, let the user know the name of the file to see for debug info.
if [ -z "$log_to" ]; then
echo "the log_to variable is not set, exiting."
exit 1
fi
if [ ! -w "$log_to" ]; then
fail "$log_to is not writeable, exiting"
fi
lo_retval=''
echo "======== Running ""$@" >>"$log_to"
"$@" >>"$log_to" 2>&1
lo_retval=$?
if [ "$lo_retval" -ne 0 ]; then
echo "The following command was run:" >&2
echo "$@" >&2
echo "The return code was $lo_retval , likely indicating a failure." >&2
fail "More detail about this error can be found in $log_to"
fi
return "$lo_retval"
}
echo "log to is $log_to" log_output sleep 3 log_output /bin/echo command2 log_output /usr/bin/nonexistantcommand log_output false `
Sample log created from the above:
======== Running sleep 3 ======== Running /bin/echo command2 command2 ======== Running /usr/bin/nonexistantcommand ./log_output.sh: line 56: /usr/bin/nonexistantcommand: No such file or directory ======== Running false
Cool, thanks for that! I'll roll something similar into branch in the next couple of days and let you know.