postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Cloning a Postgres Cluster with the pgbackrest option repo1-storage-verify-tls: "y" fails

Open RogierO opened this issue 3 years ago • 2 comments

Overview

For pgbackrest to be able to communicate with ou3 S3 (Netapp) Storage we have included the necessary certificates and have set repo1-storage-ca-file: "/etc/pgbackrest/conf.d/certs.pem" and repo1-storage-verify-tls: "y". I'm able to backup and restore our Postgres 13 Cluster without any problems with these two pgbackrest options.

But when I'm trying to clone a Postgres Cluster either from the running Postgres Cluster or cloning it from the backups stored in the S3 storage the restore pods will give the following error: ERROR: [095]: unable to set user-defined CA certificate location.

Once I set repo1-storage-verify-tls: "n", and apply the configuration again, the clone gets restored right away without any issues. Running regular backups and restores after, only work with keeping the repo1-storage-verify-tls set to"n". If setting the repo1-storage-verify-tls to"y", the backups will start failing with: ERROR: [095]: unable to set user-defined CA certificate location.

Environment

  • Platform: Kubernetes in combination with Rancher
  • Platform Version: Rancher 2.6.3 with Kubernetes v1.21.8-rancher2-1
  • PGO Image Tag: ubi8-5.0.5-0
  • Postgres Version: 13
  • Storage: Trident & Netapp ONTAP
  • Backup Storage: S3

Steps to Reproduce

REPRO

Provide steps to get to the error condition: Here is the Postgres Cluster definition that I'm using to clone a Postgres Cluster from S3 storage:

postgrescluster.yaml:

---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: clonedb
spec:
  postgresVersion: 13
  dataSource:
    pgbackrest:
      configuration:
      - secret:
          name: pgo-s3-creds
      stanza: db
      global:
        repo1-path: "/pgbackrest/testdb"
        repo1-s3-port: "443"
        repo1-s3-uri-style: "path"
        repo1-storage-ca-file: "/etc/pgbackrest/conf.d/certs.pem"
        repo1-storage-verify-tls: "y"
      repo:
        name: repo1
        s3:
          bucket: "my-bucket"
          endpoint: "my-endpoint"
          region: "my-region"
      options:
      - --type=immediate
  instances:
    - name: instance1
      replicas: 3
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      resources:
        limits:
          cpu: 100m
          memory: 100Mi
        requests:
          cpu: 50m
          memory: 50Mi
      sidecars:
        replicaCertCopy:
          resources:
            limits:
              cpu: 100m
              memory: 100Mi
            requests:
              cpu: 50m
              memory: 50Mi
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          timezone: 'Europe/Amsterdam'
  backups:
    pgbackrest:
      configuration:
      - secret:
          name: pgo-s3-creds
      repos:
      - name: repo1
        s3:
          bucket: "my-bucket"
          endpoint: "my-endpoint"
          region: "my-region"
        schedules:
          full: "0 1 * * *"
          incremental: "0 */1 * * *"
      global:
        repo1-retention-full: "5"
        repo1-retention-full-type: time
        repo1-s3-port: "443"
        repo1-s3-uri-style: "path"
        repo1-storage-ca-file: "/etc/pgbackrest/conf.d/certs.pem"
        repo1-storage-verify-tls: "y"
        repo1-type: "s3"
        repo1-path: "/pgbackrest/clonedb"
      manual:
        repoName: repo1
        options:
         - --type=full

kustomization.yaml:

namespace: my-cluster

secretGenerator:
- name: pgo-s3-creds
  files:
  - s3-ca-certificate.pem
  - s3.conf

generatorOptions:
  disableNameSuffixHash: true

resources:
- postgrescluster.yaml

s3.conf:

[global]
repo1-s3-key=my-key
repo1-s3-key-secret=my-key-secret

And of course a file name s3-ca-certificate.pem with the certificates in it.

Apply the above code with:

kubectl apply -n y-cluster -k .

This will result in the restore pod throwing errors.

Setting repo1-storage-verify-tls to "n" within the dataSource part, will make the restore pod run and restore the Postgres Cluster. But when starting a backup on this new cluster, will fail again.

Setting repo1-storage-verify-tls to "n" within the backups part as well, will allow creating backups again of this new Postgres Cluster.

EXPECTED

The new Postgres Cluster that has been created by cloning from S3 storage to be up and running with and pgbackrest accepting the included certificates and repo1-storage-verify-tls: "y" It is also to be expected that backups can be made succesfully with pgbackrest.

ACTUAL

The restore pod that gets deployed to the namespace is showing errors about not able to find the CA certificate location, leading to many different restore pods showing the same error and no working Postgres Cluster.

Logs

Log of the restore pod when repo1-storage-verify-tls is set to "y" within the dataSource and backups part of the postgrescluster,yaml

WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg13' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
ERROR: [095]: unable to set user-defined CA certificate location: [33558530] No such file or directory

Log of the postgres operator when trying to backup after repo1-storage-verify-tls is set to "n" within the dataSource and repo1-storage-verify-tls is set to "y" within the backups part of the postgrescluster,yaml

time="2022-04-08T08:35:52Z" level=error msg="unable to create stanza" error="command terminated with exit code 95: ERROR: [095]: unable to set user-defined CA certificate location: [33558530] No such file or directory\n" file="internal/controller/postgrescluster/pgbackrest.go:2578" func="postgrescluster.(*Reconciler).reconcileStanzaCreate" name=clonedb namespace=dba-postgrescluster-clonedb-s3 reconciler=pgBackRest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.5-0
time="2022-04-08T08:35:52Z" level=debug msg=Warning file="sigs.k8s.io/[email protected]/pkg/internal/recorder/recorder.go:98" func="recorder.(*Provider).getBroadcaster.func1.1" message="command terminated with exit code 95: ERROR: [095]: unable to set user-defined CA certificate location: [33558530] No such file or directory\n" object="{PostgresCluster dba-postgrescluster-clonedb-s3 clonedb 17c02c9d-ab83-49bd-9a2e-737a6a2764bd postgres-operator.crunchydata.com/v1beta1 21792696 }" reason=UnableToCreateStanzas version=5.0.5-0

Can you please help me out on fixing this issue, as we would like to be able to keep using pgbackrest with TLS validation on.

RogierO avatar Apr 08 '22 11:04 RogierO

Having a look at your kustomization file and your config makes me wondering about missmatching certificate names:

kustomization.yaml generates - s3-ca-certificate.pem

cluster config uses: repo1-storage-ca-file: "/etc/pgbackrest/conf.d/certs.pem"

should not both be the same?

dzabel avatar Jun 12 '23 15:06 dzabel