Backup failed during copyBackup: No tomes available
-
I'm submitting a ...
- [ ] bug report
- [x] feature request
- [ ] support request
-
What is the current behavior?
If an error occurs - specifically "no tomes available" with B2 - during copyBackup (to the remote location), there will be no retry. The backup is deleted after the upload failed so effectively no backup was made at all.
- If the current behavior is a bug, please provide the configuration and steps to reproduce and if possible a minimal demo of the problem.
(Fortunately) It is rather hard to reproduce exactly that error, since it apparently means, that B2 didn't have enough available backup servers to take the backup/upload request. This only happened once for me.
- What is the expected behavior?
If there is a way to determine, if an error during copyBackup is one that could be mitigated by retrying, then that would be great. At least the specific "no tomes available" could be used to trigger a retry for uploading the backup. Also, as it looks like in the logs, the backup was deleted locally even though the upload failed. I would have expected that it at least doesn't get deleted. Maybe it could've gotten uploaded in the next iteration of the backup schedule (every night).
- What is the motivation / use case for changing the behavior?
Increased stability and lower chance of failed/lost backups.
-
Please tell us about your environment:
- Image version: 2.20.0
- Docker version: 20.10.17
- docker-compose version: 1.29.2 (docker-compose) and 2.6.0 (docker compose)
-
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, etc)
`Running docker-volume-backup failed with error: copyBackup: error uploading backup to remote storage: no tomes available
Log output of the failed run was:
time="2022-07-20T04:00:04+02:00" level=info msg="Created backup of /backup at /tmp/backup-2022-07-20T04-00-00.tar.gz."
time="2022-07-20T04:00:04+02:00" level=info msg="Encrypted backup using given passphrase, saving as /tmp/backup-2022-07-20T04-00-00.tar.gz.gpg."
time="2022-07-20T04:01:00+02:00" level=error msg="Fatal error running backup: copyBackup: error uploading backup to remote storage: no tomes available"
time="2022-07-20T04:01:00+02:00" level=info msg="Removed tar file /tmp/backup-2022-07-20T04-00-00.tar.gz."
time="2022-07-20T04:01:00+02:00" level=info msg="Removed GPG file /tmp/backup-2022-07-20T04-00-00.tar.gz.gpg."`
I don't think I would like to add code paths that cater for the specific behavior of Backblaze here.
However the S3 protocol does know about the concept of a retryable error and the MinIO client used by this image also knows when it can retry an operation because the error code is retryable (see https://github.com/minio/minio/discussions/14158). Do you know the status code Backblaze returns in these situations?
I agree, a universal solution for the S3 API would be better here.
B2
According to this (and this) it will send a 503 status code. The article also explains that 500 is fine to retry after a few seconds.
AWS
In case of AWS there are more fine grained messages, but all the 503 ones have the "try again" or "slow down" in common - also the 500 error states "try again". (See here).
It seems both 500 and 503 should be retried: https://github.com/minio/minio-go/blob/a2b545423f4d967f656288e5c0a741041be4ded7/retry.go#L110-L119 so I would assume that either a. retrying is broken or b. your setup hit the maximum number of retries.
I'll check whether a. is the case or not.