postgresql cannot boot because of the database backup will not be usable without all the WAL segments
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
- Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
- Are you running Postgres Operator in production? no
- Type of issue? feature request
Some general remarks when posting a bug report:
- Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate GitHub gist together with your Postgres CRD and configuration manifest.
- If you feel this issue might be more related to the Spilo docker image or Patroni, consider opening issues in the respective repos.
Hi, I create a postgresql instance with follow config:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-test
spec:
teamId: "acid"
numberOfInstances: 2
enableMasterLoadBalancer: true
postgresql:
version: "14"
parameters:
wal_level: "archive"
# archive_command: 'sh /script/backup.sh'
volume:
size: 1Gi
additionalVolumes:
- name: backup-script
mountPath: /tmp
targetContainers:
- all
volumeSource:
configMap:
name: backup-script
env:
- name: AWS_ACCESS_KEY_ID
value: minioadmin
- name: AWS_SECRET_ACCESS_KEY
value: minioadmin
- name: PGUSER
valueFrom:
secretKeyRef:
key: username
name: postgres.acid-test.credentials.postgresql.acid.zalan.do
- name: PGPASSWORD
valueFrom:
secretKeyRef:
key: password
name: postgres.acid-test.credentials.postgresql.acid.zalan.do
- name: AWS_ENDPOINT
value: http://10.10.15.114:9001
- name: AWS_S3_FORCE_PATH_STYLE
value: "true"
- name: USE_WALG_BACKUP
value: "true"
- name: WALG_S3_PREFIX
value: s3://test
resources:
requests:
cpu: 10m
memory: 100Mi
limits:
cpu: 1000m
memory: 1Gi
after commit to k8s , it can't create , like this:
zuvakin@linux:~/projects/k8s/postgresql$ kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
acid-test-1 0/1 Running 0 9m1s
acid-test-0 1/1 Running 0 9m11s
acid-test-1 can't boot nornally, and logs as below:
2022-07-26 07:51:21,425 - bootstrapping - INFO - Configuring standby-cluster
2022-07-26 07:51:21,425 - bootstrapping - INFO - Configuring patroni
2022-07-26 07:51:21,431 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-07-26 07:51:22,635 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2022-07-26 07:51:22,644 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-07-26 07:51:22,646 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:22,657 INFO: trying to bootstrap from leader 'acid-test-0'
2022-07-26 07:51:24,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:24,811 INFO: bootstrap from leader 'acid-test-0' in progress
Can not find any backups
2022-07-26 07:51:24,832 ERROR: Error creating replica using method wal_e: envdir /run/etc/wal-e.d/env bash /scripts/wale_restore.sh exited with code=1
2022-07-26 07:51:34,837 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:34,837 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:51:44,788 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:44,788 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:51:54,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:51:54,795 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:04,789 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:04,789 INFO: bootstrap from leader 'acid-test-0' in progress
1024+0 records in
1024+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0341451 s, 491 MB/s
NOTICE: base backup done, waiting for required WAL segments to be archived
2022-07-26 07:52:14,832 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:14,832 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:24,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:24,795 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:34,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:34,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:44,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:44,796 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:52:54,787 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:52:54,787 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:04,791 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:04,791 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:53:14,835 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:14,836 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:24,788 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:24,788 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:34,832 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:34,832 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:44,792 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:44,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:53:54,825 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:53:54,825 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:04,783 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:04,783 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (120 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:54:14,799 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:14,799 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:24,831 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:24,831 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:34,800 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:34,800 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:44,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:44,798 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:54:54,828 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:54:54,828 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:04,798 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:04,798 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:14,801 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:14,801 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:24,824 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:24,824 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:34,781 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:34,781 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:44,784 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:44,785 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:55:54,828 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:55:54,828 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:04,795 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:04,795 INFO: bootstrap from leader 'acid-test-0' in progress
WARNING: still waiting for all required WAL segments to be archived (240 seconds elapsed)
HINT: Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments.
2022-07-26 07:56:14,789 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:14,789 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:24,825 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:24,825 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:34,797 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:34,797 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:44,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:44,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:56:54,827 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:56:54,827 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:04,793 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:04,793 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:14,794 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:14,794 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:24,790 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:24,791 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:34,836 INFO: Lock owner: acid-test-0; I am acid-test-1
2022-07-26 07:57:34,836 INFO: bootstrap from leader 'acid-test-0' in progress
2022-07-26 07:57:44,793 INFO: Lock owner: acid-test-0; I am acid-test-1
it appear ' Check that your archive_command is executing properly. You can safely cancel this backup, but the database backup will not be usable without all the WAL segments'
what's wrong with it , does the CR correct?
It is not a problem of postgre-operator or spilo. They only operate with the configuration you provide. If something is broken - you have to figure out what and fix it.
@godzilla-s Just stumbling over it while checking for a different backup-related issue: does the configuration style mentioned in this other issue answer help? https://github.com/zalando/postgres-operator/issues/1965#issuecomment-1185387027
@godzilla-s you need to check logs of pod acid-test-0
This trouble is occured because acid-test-0 can't connect to S3 storage for make base backup and archive wal files.
And acid-test-1 is waiting for all wal files from S3 storage.
When I investigated this trouble I entered into pod acid-test-0 and run a script for making backup:
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
and checked an errors.
There is example of good logs of master pod
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2022-08-09 10:03:53,205 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:03,246 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:13,204 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:23,228 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:33,187 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:37.074 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-08-09 10:04:37.148 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/08/09 10:04:37.254145 Calling pg_start_backup()
INFO: 2022/08/09 10:04:37.779208 Starting a new tar bundle
INFO: 2022/08/09 10:04:37.779243 Walking ...
INFO: 2022/08/09 10:04:37.780051 Starting part 1 ...
2022-08-09 10:04:38.075 38 LOG Starting pgqd 3.3
2022-08-09 10:04:38.075 38 LOG auto-detecting dbs ...
INFO: 2022/08/09 10:04:39.741977 Packing ...
INFO: 2022/08/09 10:04:39.744801 Finished writing part 1.
INFO: 2022/08/09 10:04:39.856157 Starting part 2 ...
INFO: 2022/08/09 10:04:39.856185 /global/pg_control
INFO: 2022/08/09 10:04:39.858861 Finished writing part 2.
INFO: 2022/08/09 10:04:39.858872 Calling pg_stop_backup()
INFO: 2022/08/09 10:04:41.166761 Starting part 3 ...
INFO: 2022/08/09 10:04:41.180452 backup_label
INFO: 2022/08/09 10:04:41.180472 tablespace_map
INFO: 2022/08/09 10:04:41.180534 Finished writing part 3.
INFO: 2022/08/09 10:04:41.210225 Wrote backup with name base_00000007000000000000001D
2022-08-09 10:04:43,251 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:04:53,205 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:05:03,185 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
2022-08-09 10:05:08.104 38 LOG {ticks: 0, maint: 0, retry: 0}
2022-08-09 10:05:13,227 INFO: no action. I am (id567-pgsql-92f50bc3-1), the leader with the lock
leader's pod botstrap is finished only after S3 backup is done
something wrong with s3 server, thanks all
it worked now