postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Unable to update pgBackRest configuration from single local volume repo to multirepo - local and s3 (MinIO)

Open rmiguelac opened this issue 3 years ago • 2 comments

Overview

The first deploy of the Postgres cluster I did was an installation through helm that created the cluster without any backups at all. Now that I got the time to improve the deployment, I tried to use the multirepo backup strategy, where I would have backups both locally stored in Kubernetes volumes (backed by rook-ceph) and in MinIO, external to the cluster.

The issue happens when I upgrade the postgrescluster CR configuration, adding the second repo (repo2) with MinIO (s3) configuration. To be clearer, I have a Postgres cluster with no backup configured. If I add only local volumes for backups, the configuration change is taken as expected. If I try to add both local and s3, the operator complains that the pgBackRest configuration changed and there is a hash mismatch. On the other hand, if I create the cluster with multirepo (same config attempted before) from the scratch, not an update in-place, it does not complain.

Environment

Please provide the following details:

  • Platform: Rancher
  • Platform Version: 1.22.9
  • PGO Image Tag: ubi8-5.1.1-0
  • Postgres Version 14
  • Storage: rook-ceph storageclass

Steps to Reproduce

Install cluster using helm (pgo then postgres) without pgBackRest configuration Install cluster using helm (pgo then postgres, although pgo won't change) providing pgBackRest configuration for both local and s3 (MinIO) repos. Operator will throw:

time="2022-07-25T19:47:28Z" level=info msg="pgBackRest config hash mismatch detected, requeuing to reattempt stanza create" file="internal/controller/postgrescluster/pgbackrest.go:1328" func="postgrescluster.(*Reconciler).reconcilePGBackRest" name=postgres namespace=namespacename reconciler=pgBackRest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0

EXPECTED

Operator would not complain and backup would start automatically, as it happens with a single repo configuration.

ACTUAL

stanza create re queued and remains trying

Logs

Operator log:

time="2022-07-25T19:47:28Z" level=info msg="pgBackRest config hash mismatch detected, requeuing to reattempt stanza create" file="internal/controller/postgrescluster/pgbackrest.go:1328" func="postgrescluster.(*Reconciler).reconcilePGBackRest" name=postgres namespace=namespacename reconciler=pgBackRest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0

pgo helm values:

singleNamespace: true
debug: false
imagePullSecretNames:
  - "secret-x"

controllerImages:
  cluster: "registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi8-5.1.1-0"
  upgrade: "registry.developers.crunchydata.com/crunchydata/postgres-operator-upgrade:ubi8-5.1.1-0"

# relatedImages are used when an image is omitted from PostgresCluster or PGUpgrade specs.
relatedImages:
  postgres_14:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.3-0"
  postgres_14_gis_3.1:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-14.3-3.1-0"
  postgres_13:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-13.7-0"
  postgres_13_gis_3.1:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-13.7-3.1-0"
  pgadmin:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi8-4.30-1"
  pgbackrest:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-1"
  pgbouncer:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.16-3"
  pgexporter:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.1.1-0"
  pgupgrade:
    image: "registry.developers.crunchydata.com/crunchydata/crunchy-upgrade:ubi8-5.1.1-0"

resources:
  limits:
    cpu: 100m
    memory: 100Mi
  requests:
    cpu: 100m
    memory: 100Mi

Working postgrescluster CR helm values:

name: "postgres"
postgresVersion: 14
postGISVersion: 3.1
monitoring: true
imagePostgres: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-14.3-3.1-0"
imagePgBackRest: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-1"
imagePgBouncer: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.16-3"
imageExporter: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.1.1-0"
imagePullSecrets:
  - name: "secret-x"
instanceName: postgres-instance
instanceSize: 30Gi
patroni:
  dynamicConfiguration:
    postgresql:
      pg_hba:
        - "hostnossl all all all md5" # Required to allow non-tls connections
      parameters:
        shared_buffers: 3140MB
        work_mem: 4MB
        maintenance_work_mem: 1570MB
        max_connections: 200
        effective_io_concurrency: 200
        max_worker_processes: 19
        max_parallel_workers_per_gather: 4
        wal_buffers: 16MB
        max_wal_size: 1GB
        min_wal_size: 512MB
        random_page_cost: 1.1
        effective_cache_size: 9421MB
        default_statistics_target: 500
        log_timezone: 'UTC'
        ssl: "off"
        autovacuum_max_workers: 10
        autovacuum_naptime: 10
        datestyle: 'iso, mdy'
        max_locks_per_transaction: 128
        shared_preload_libraries: timescaledb
        timescaledb.telemetry_level: basic
databaseInitSQL:
  name: bootstrap-sql
  key: sql
pgBouncerConfig:
  config:
    global:
      max_client_conn: "200"
      server_tls_sslmode: disable # Required to allow non-tls connections
users:
  - name: postgres
    password:
      type: AlphaNumeric
openshift: false

instanceCPU: 1
instanceMemory: 2Gi

pgBackRestConfig:
  repos:
  - name: repo1
    schedules:
      full: "0 0 * * 6" # at each Saturday
      incremental: "*/30 * * * *" # at each 30m
    volume:
      volumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 35Gi
  global:
    repo1-retention-full: "14"
    repo1-retention-full-type: time

not working postgrescluster CR helm values:

name: "postgres"
postgresVersion: 14
postGISVersion: 3.1
monitoring: true
imagePostgres: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-14.3-3.1-0"
imagePgBackRest: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-1"
imagePgBouncer: "registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.16-3"
imageExporter: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.1.1-0"
imagePullSecrets:
  - name: "secret-x"
instanceName: postgres-instance
instanceSize: 30Gi
patroni:
  dynamicConfiguration:
    postgresql:
      pg_hba:
        - "hostnossl all all all md5" # Required to allow non-tls connections
      parameters:
        shared_buffers: 3140MB
        work_mem: 4MB
        maintenance_work_mem: 1570MB
        max_connections: 200
        effective_io_concurrency: 200
        max_worker_processes: 19
        max_parallel_workers_per_gather: 4
        wal_buffers: 16MB
        max_wal_size: 1GB
        min_wal_size: 512MB
        random_page_cost: 1.1
        effective_cache_size: 9421MB
        default_statistics_target: 500
        log_timezone: 'UTC'
        ssl: "off"
        autovacuum_max_workers: 10
        autovacuum_naptime: 10
        datestyle: 'iso, mdy'
        max_locks_per_transaction: 128
        shared_preload_libraries: timescaledb
        timescaledb.telemetry_level: basic
databaseInitSQL:
  name: bootstrap-sql
  key: sql
pgBouncerConfig:
  config:
    global:
      max_client_conn: "200"
      server_tls_sslmode: disable # Required to allow non-tls connections
users:
  - name: postgres
    password:
      type: AlphaNumeric
openshift: false

instanceCPU: 1
instanceMemory: 2Gi

pgBackRestConfig:
  repos:
  - name: repo1
    schedules:
      full: "0 0 * * 6"
      incremental: "0 */12 * * *"
    volume:
      volumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 35Gi
              - name: repo2
                s3:
                  bucket: "postgres-backups"
                  endpoint: "10.0.0.6:9000"
                  region: "notusedbyminio"
  global:
    repo1-retention-full: "14"
    repo1-retention-full-type: time
    repo2-s3-uri-style: path
    repo2-storage-verify-tls: "n"
    repo2-storage-port: "9000"
    repo2-s3-key: "xxxxxxxxx"
    repo2-s3-key-secret: "yyyyyyy"

rmiguelac avatar Jul 28 '22 14:07 rmiguelac

Hello,

Do we have any suggestions?

rmiguelac avatar Aug 11 '22 12:08 rmiguelac

Hey @rmiguelac, that warning is expected when we run into an issue with stanza-create. As the cluster continues to reconcile, it should get into a better state and be able to create the stanza.

I replicated that log message by adding a new cloud backup repo in addition to an initial PV-based repo. The warning showed up a handful of times, and then I finally saw this message:

time="2022-08-23T14:08:20Z" level=debug msg=Normal file="sigs.k8s.io/[email protected]/pkg/internal/recorder/recorder.go:98" func="recorder.(*Provider).getBroadcaster.func1.1" message="pgBackRest stanza creation completed successfully" object="{PostgresCluster postgres-operator test-1 00462c52-5f3d-488f-ab0f-862c9394ea17 postgres-operator.crunchydata.com/v1beta1 3027734 }" reason=StanzasCreated version=latest

What does your postgrescluster status look like? It should show if stanzas have been created for each repo.

status:
  conditions:
  - lastTransitionTime: "2022-08-23T18:49:18Z"
    message: pgBackRest replica create repo is ready for backups
    observedGeneration: 2
    reason: StanzaCreated
    status: "True"
    type: PGBackRestReplicaRepoReady
  pgbackrest:
    repos:
    - bound: true
      name: repo1
      replicaCreateBackupComplete: true
      stanzaCreated: true
      volume: pvc-f4e97143-b710-4b8e-9d85-c724099f90cc
    - name: repo4
      repoOptionsHash: 6d6c5d6679
      stanzaCreated: true

jmckulk avatar Aug 23 '22 19:08 jmckulk