duplicacy icon indicating copy to clipboard operation
duplicacy copied to clipboard

The 0 exit code is a little misleading

Open drsound opened this issue 7 years ago • 18 comments

As it is implemented right now, the 0 exit code is a little misleading: as it happens for every other program I know, a 0 exit code should be returned if absolutely no problem occurred, not even a warning, so that, for example, you can use it in a script to send an email saying "backup successful".

I just executed a backup, a file was not backed up, but I still got a 0 exit code!

Some questions I asked myself:

  1. What if it was an important file?
  2. What if 99% of the files for some reasons were not backed up? Would I still get a 0 exit code?
  3. Should I always "grep" the backup output looking for signs of trouble?
  4. What if I grep for some specific text but in some future release the warning message changes? I will think my backups are fine but I could miss some files!

I think exit codes were invented to avoid this kind of problems.

My proposal: why not to add another exit code in case of any warning or problem during the backup?

Here is my backup output that gave me 0 exit code, look at the last line:

Backup for / at revision 3 completed
Files: 1278499 total, 573,580M bytes; 183 new, 70,170K bytes
File chunks: 116957 total, 573,807M bytes; 11 new, 62,841K bytes, 19,645K bytes uploaded
Metadata chunks: 92 total, 482,955K bytes; 16 new, 99,754K bytes, 28,975K bytes uploaded
All chunks: 117049 total, 574,278M bytes; 27 new, 162,596K bytes, 48,620K bytes uploaded
Total running time: 00:02:30
1 file was not included due to access errors

drsound avatar Feb 13 '19 09:02 drsound

Changing the exit code at this time is not a good idea. It would break some existing users' scripts.

I think you can just grep the output to make sure that no files are skipped. These specific log messages are unlikely to be changed in the future.

gilbertchen avatar Feb 14 '19 18:02 gilbertchen

What about adding an argument that lets us change the exit code behavior? If the argument isn't specified, the behavior stays the same, but if we specify an argument like -detailed-exit-code we could have a more detailed exit code mapping that can tell us if something failed to upload or similar? Grepping the output is really a bad solution here.

Thalagyrt avatar Mar 12 '19 17:03 Thalagyrt

@gilbertchen is afraid of breaking existing backup scripts. I try to analyze the possible cases to show there would be no breaking.

Every kind of backup script I can think of, can do 4 things with the exit code:

  1. Ignore it (Very bad thing!)
  2. Tell the user "everything is fine" if it's 0 (Bad thing! What if you have a lot of missing files?)
  3. Tell the user "everything is fine" if it's 0 AND grepping the output you haven't any "not included files" (Ok, but it leads to ugly and inefficient scripts. Moreover, where in the duplicacy documentation is written you should do this grepping to be sure you have a correct backup? I would bet many people don't do it!)
  4. Tell the user "something went wrong" if it's not 0, with optional variants depending on the exact exit code value

If we add an additional exit code for "not included files" the script behaviour in the previous 4 cases would change as follows:

In case there aren't "not included files" (most common situation):

  1. No change
  2. No change
  3. No change
  4. No change

In case there are "not included files":

  1. No change
  2. Script behaviour would be better, because it will not give the user a false sense of security, a wrong "everything is fine" message
  3. The exit code will be 0 no more, so the grepping will not be done, but every script checking for a 0 exit code, for sure also checks for a non-zero one, so the error will be shown by next case # 4
  4. No change in case the script doen't check the exact non-zero exit code. In case the script checks the exact exit code to show different error messages, the new exit code will not be recognized, so a generic error will be shown.

Now, in my opinion the higher priority should be not to give false sense of security (case # 2).

Anyway, if @gilbertchen does not agree with directly adding a new exit state, I think @Thalagyrt solution is the second best we could have.

drsound avatar Mar 12 '19 19:03 drsound

Good analysis. I'm definitely in agreement with @drsound here. I don't see any way in which adding a status code for incomplete/failed backups would break existing scripts in a bad way.

If someone's ignoring the exit code, there really is no change. If someone's using set -e or otherwise checking exit codes, then they're going to get an immediate improvement in that their script will know about the failure from the exit code immediately.

Further, if someone's checking exit codes, they pretty much are guaranteed to be knowledgeable enough to have the script send an alert on non-zero exit codes or otherwise just abort on failure and call into something like dead man's snitch on success, and the script aborting on the non-zero would cause DMS or similar to alert since the final check-in would short circuit out.

Thalagyrt avatar Mar 12 '19 19:03 Thalagyrt

Agreed, a return code of zero with a less than successful backup is not good. In my opinion, a non-zero exit code should be used if a file included (explicitly or implicitly) in the backup is not successfully backed up

leftytennis avatar May 29 '19 16:05 leftytennis

So I'm running running duplicacy web and it shows all my backups as succeeding (they were green). But when I clicked on the backup status I saw a bunch of permission error warnings, so half my data wasn't actually backed up. This is a very scary situation to be in. The exit code needs to indicate these failures.

pylint uses a bit mask as the exit code. That might be useful so the log files don't need to be parsed to determine what type of errors happened.

ismell avatar Aug 16 '19 15:08 ismell

Hey @gilbertchen, I'd just like to draw attention to this issue again - it's creating needless complexity and resource waste in scripts due to having to grep for error messages instead of read a simple exit code, and using exit codes would fall in line with how every other POSIX based tool behaves. Always exiting 0 even in failure is a pretty clear and cut violation of the principle of least astonishment and most people who are experienced with any POSIX compliant system are going to assume 0 means all's good, leading to the various situations mentioned above where backups are silently failing and people have no idea. I really see no way in which this would break people's scripts for the worse. I'd love to discuss a bit more in depth if you're up for it at some point.

Thalagyrt avatar Dec 04 '19 16:12 Thalagyrt

If the concern is that changing error codes is going to break people's scripts, I like @Thalagyrt's idea of adding something like --posix-exit-codes to trigger the behavior of exiting nonzero when it runs into issues during the backup.

It's scary to think that backups could be failing for a long time with no easy way to detect the failure in my backup wrapper scripts.

unixorn avatar May 03 '20 23:05 unixorn

@gilbertchen I would also like to draw attention to the issue of returning exit 0 on unsuccessful backups. I am using the web version and if it states, that the backup was successful, it should be successful. A backup solution is only good when one can trust it. So even if you stand by your point in not wanting to possibly break user scripts, the UI's "completed successfully" is wrong

ysGrob avatar Dec 03 '20 10:12 ysGrob

Hi @gilbertchen - Is this still accurate in 2021? If that's the case, I'd also like to draw attention to this...

xxxliqu1dxxx avatar May 04 '21 12:05 xxxliqu1dxxx

Any news on this issue?

Dex321x avatar Jun 02 '22 12:06 Dex321x

Changing the exit code at this time is not a good idea. It would break some existing users' scripts.

I find it hard to follow this justification. If scripts are written properly they already handle errors on a non-zero exit status, including an unknown error. Having the error properly reported will follow a proper error-handling path in the user's scripts.

If the scripts are already written to grep through logs and disregard the return code, then their behavior is not expected to change.

The only scenario when a user script's behavior will be materially different is if, after detecting a no-zero RC, it is no longer grepping and so will report a generic error and not a specific one from grep - an inconvenience to a user, but not a data loss.

Still, changing the return code meaning is a breaking change, even if to address this design bug. I think this is a big enough bug to warrant a major release version bump with proper announcements of the backward incompatibility.

didenko avatar Aug 06 '23 12:08 didenko

I ended up switching to restic because of this. I can't trust the backups and am uninterested in parsing logs to find errors - I'll never be confident that I've addressed every possible error case. It's a shame, I really like duplicacy other than this one very serious issue.

Add a --posix-exit-codes option to trigger exiting non-zero for editing if you're worried about breaking existing wrapper scripts so that people don't have to write a bunch of brittle log parsing code.

unixorn avatar Aug 06 '23 13:08 unixorn

FWIW this is the post-backup script I wrote :/

#!/bin/bash -ex

# exec 1>/tmp/dup-backup.log
# exec 2>&1

cd "$HOME/.duplicacy-web/logs"
BACKUP_FILE="$(find -iname 'backup-*' -ctime -1 | sort | tail -n 1)"

if [[ ! -e "$BACKUP_FILE" ]]; then
        curl --retry 3 -X POST -d 'No backup file found' \
                 https://hc-ping.com/XXXXXXX/fail
        exit 1
fi

if grep -q ' WARN ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
elif grep -q ' BACKUP_SKIPPED ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
elif grep -q ' BACKUP_END ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX
else
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
fi

ismell avatar Aug 08 '23 18:08 ismell

This issue has been mentioned on Duplicacy Forum. There might be relevant details there:

https://forum.duplicacy.com/t/warnings-during-backup-not-surfaced-to-web-ui/8785/1

gilbertchen avatar May 29 '24 21:05 gilbertchen