postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

[Question] postgresql cluster configuration issues with overriding CLONE_AZURE_STORAGE_ACCOUNT

Open mightymiracleman opened this issue 1 year ago • 0 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? e.g. registry.opensource.zalan.do/acid/postgres-operator:v1.10.1 Currently using v1.10.1

  • Where do you run it - cloud or metal? Kubernetes or OpenShift? [AWS K8s | GCP ... | Bare Metal K8s] Cloud / Azure (AKS)

  • Are you running Postgres Operator in production? [yes | no] Not quite yet but very close.

  • Type of issue? [Bug report, question, feature request, etc.] Question

We currently configure backups for each of our clusters independent of the operator's config settings. Every cluster we create specifies/overrides the following parameters in the "env:" section of the postgresql resource, as mentioned in the documentation here:

  • AZURE_STORAGE_ACCOUNT
  • AZURE_STORAGE_ACCESS_KEY
  • WALG_AZ_PREFIX

These changes work well for pushing backups to the correct Azure storage account/container/path; however, I encountered issues with cloning. When I specify CLONE_AZURE_STORAGE_ACCOUNT and CLONE_WALG_AZ_PREFIX in the "env:" section of the postgresql resource, the PREFIX value is respected and used. However, CLONE_AZURE_STORAGE_ACCOUNT uses the value set at the operator level (specified in the OperatorConfiguration).

I noticed in the documentation that only values with WAL and LOG prefixes can be overridden from the operator values. While I expected CLONE_AZURE_STORAGE_ACCOUNT to be overridable , it seems that the precedence of values is enforced, as mentioned in the linked docs above. --I found this by looking through the logs of the pod, you could see the endpoint it was using for azure.

To resolve this, I removed the value of the "wal_az_storage_account" setting from the OperatorConfiguration. This allowed the value I set in the "env" section of the postgresql resource to persist. However, during cloning, an error is logged: "cannot figure out S3 or GS bucket or AZ storage account. All options are empty in the config." I tracked the error message down to this area of the postgres operator code (it looks like this is the area where the operator is setting environment variables)

Despite the error, as far as I can tell, the cluster cloned from the specified location without any other issues. Our use case dictates that we would never use those global settings, as each of our clusters require different storage accounts. It might be possible for us to create a centralized storage account for cloning and copy over each clusters wal data; but that would not be the most ideal.

Is this workaround acceptable? Is there any danger in not setting that value at a global level? So far, everything has been working as expected. I'm assuming that because the "wal_az_storage_account" was not set in the OperatorConfiration resource, the value I used in the "env:" section of the postgresql cluster was allowed to persist as an environment variable (I did confirm the values in the spilo pod).

Thanks for your hard work on this project; we are excited to use it.

Some general remarks when posting a bug report:

  • Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate GitHub gist together with your Postgres CRD and configuration manifest.
  • If you feel this issue might be more related to the Spilo docker image or Patroni, consider opening issues in the respective repos.

mightymiracleman avatar Feb 17 '24 17:02 mightymiracleman