Multiple Backup Repo doesn't include base backup to repo2 when configured for S3
Overview
When configuring a new cluster's pgbackrest with multiple repos - repo1 being local and repo2 being s3 - the s3 repo does not receive a copy of the base backup.
Environment
Please provide the following details:
- Platform:
Kubernetes - Platform Version: 1.24.4
- PGO Image Tag:
crunchy-postgres:ubi8-14.5-0,crunchy-pgbackrest:ubi8-2.40-0 - Postgres Version: 14
- Storage: local-path and/or block storage
Steps to Reproduce
REPRO
Provide steps to get to the error condition:
- Using kustomize/myconfig/postgres.yaml (based off multi-backup-repo example combined with ha example). Note that repo1 is local and repo2 is remote s3:
kind: PostgresCluster
metadata:
name: postgres1
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-0
postgresVersion: 14
instances:
- name: pgha1
replicas: 2
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: postgres1
postgres-operator.crunchydata.com/instance-set: pgha1
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-0
configuration:
- secret:
name: postgres1-creds
global:
repo2-path: /pgbackrest/postgres-operator/postgres1
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
- name: repo2
s3:
bucket: "my-s3-bucket"
endpoint: "s3.amazonaws.com"
region: "us-east-1"
- Run
kubectl apply -k kustomize/myconfig - Wait for cluster to come up and the base backup to complete.
EXPECTED
- The local and s3 repo
./archivedirs will show a backup has occurred ('*.backup' file will be created), and the./backupdirs will contain the base backup manifest and files.
ACTUAL
- The local and s3 repo
./archivedirs will show a backup has occurred ('*.backup' file will be created), the local repo './backup' will contain the base backup and manifest files, but the s3 repo./backupdir shows no sign of the base backup. This makes the s3 repo unusable for replication or a restore.
Logs
Logs from the backup job:
time="2022-09-13T19:58:44Z" level=info msg="crunchy-pgbackrest starts"
time="2022-09-13T19:58:44Z" level=info msg="debug flag set to false"
time="2022-09-13T19:58:44Z" level=info msg="backrest backup command requested"
time="2022-09-13T19:58:44Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1]"
time="2022-09-13T19:59:11Z" level=info msg="output=[]"
time="2022-09-13T19:59:11Z" level=info msg="stderr=[WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space\n HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.\nWARN: option 'repo2-retention-full' is not set for 'repo2-retention-full-type=count', the repository may run out of space\n HINT: to retain full backups indefinitely (without warning), set option 'repo2-retention-full' to the maximum.\nWARN: no prior backup exists, incr backup has been changed to full\n]"
time="2022-09-13T19:59:11Z" level=info msg="crunchy-pgbackrest ends"
Additional Information
When only an s3 repo is configured, the backup is stored correctly. I have not tested the inverse scenario of s3 as repo1 and local as repo2.
The issue seems to be occurring due to the way this command is formulated:
// Reconcile the initial backup that is needed to enable replica creation using pgBackRest.
// This is done once stanza creation is successful
if err := r.reconcileReplicaCreateBackup(ctx, postgresCluster, instances,
repoResources.replicaCreateBackupJobs, sa, configHash, replicaCreateRepo); err != nil {
log.Error(err, "unable to reconcile replica creation backup")
result = updateReconcileResult(result, reconcile.Result{Requeue: true})
}
Note that replicaCreateRepo will always be the last-volume mounted repo (if it exists) being returned by r.reconcileRepos() called earlier. However, for multirepo deployments, this probably needs to be empty since all repos will need the base backup (and the --repo switch should be left out of the backup command). Am I following this incorrectly?
This is expected. You'll need to schedule backups for your repositories as described in https://access.crunchydata.com/documentation/postgres-operator/latest/tutorial/backup-management/
@cbandy I see how this can be done, and appreciate your response. But the documentation on that page is misleading:
PGO sets up your Postgres clusters so that they are continuously archiving the write-ahead log: your data is constantly being stored in your backup repository. Effectively, this is a backup!
However, in a disaster recovery scenario, you likely want to get your Postgres cluster back up and running as quickly as possible (e.g. a short “recovery time objective (RTO)”). What helps accomplish this is to take periodic backups. This makes it faster to restore!
The wording seems to imply that scheduled backups are a best practice, a 'nice to have' if you will. But since the base backup will not occur on external repos - the continuous archive of an external repo does not (yet) act as a backup! A scheduled backup should be considered a baseline requirement, and not just to make it 'faster to restore'.
However, given that a base backup is already happening as part of any cluster deployment, would it not make sense to make this happen for each repo? This would make the behavior consistent with the documented intent, and avoid surprising the user with the variance in behavior between local and remote repos.
As described above, the behavior described in this issue is expected. More specifically, backups need to be scheduled for the various pgBackRest repositories, as described in the following comment above: https://github.com/CrunchyData/postgres-operator/issues/3381#issuecomment-1247018766.
If you have any additional questions about scheduled backups and/or anything else related to Disaster Recovery within Crunchy Postgres for Kubernetes, please feel free to reach out via the PGO project community discord server.