postgres-operator Is there any way to disable archiving?

I found PGO making archiving mandatory:

        // Send WAL files to all configured repositories when not in recovery.
        // - https://pgbackrest.org/user-guide.html#quickstart/configure-archiving
        // - https://pgbackrest.org/command.html#command-archive-push
        // - https://www.postgresql.org/docs/current/runtime-config-wal.html
        archive := `pgbackrest --stanza=` + DefaultStanzaName + ` archive-push "%p"`
        outParameters.Mandatory.Add("archive_mode", "on")
        outParameters.Mandatory.Add("archive_command", archive)

This also requires having a repo retention and at least on scheduled backup to get automatic retention management working.

There might be cases where I do not want archiving, eg. loading a large amount of data. There might be even cases where I do not want archiving nor backups, eg. some kind of temporary test systems.

QUESTION: Is there any way to disable archiving?

Environment

Please provide the following details:

Platform: Anthos
Platform Version: 1.10 (afaik)
PGO Image Tag: postgres-operator:ubi8-5.1.1-0
Postgres Version: 14
Storage: vm ware csi
Number of Postgres clusters: several

Oct 24 '22 10:10 mzwettler2

I also find it quite odd that I have to enable backups

Nov 02 '22 14:11 laurivosandi

Any solutions found?

Nov 08 '22 08:11 nfzv

I mangled the CRD definition and set minIndex: 0 in the relevant section :D

Nov 08 '22 08:11 laurivosandi

@laurivosandi

What did you do exactly? Hacking the PGO sources?

Nov 08 '22 10:11 mzwettler2

@cbandy

Maybe this would be an enhancement request?

Nov 14 '22 14:11 mzwettler2

JFYI: After hacking the CRD-s looks like WAL keeps growing forever :D

So yeah authors should address this issue themselves

Dec 15 '22 13:12 laurivosandi

Just dropping a line here to note that we have a ticket in our backlog for disabling backups/archiving (for dev envs only, please!)

(This is a feature that has been requested in issues a few times, see also https://github.com/CrunchyData/postgres-operator/issues/2531 https://github.com/CrunchyData/postgres-operator/issues/3152 I'm adding these links to help consolidate this issue in case I close any of these tickets.)

Jan 03 '23 19:01 benjaminjb

What is the timeline for this feature? Do you have an estimate on when this will be available?

Jan 25 '23 11:01 Pluggi

Any update on this?

Mar 09 '23 13:03 sigwinch28

I've found a way to do this temporarily:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart

It's very useful when my initial datasource is a big pg_dump file.

After turning archive_mode back on, kicking-off a full backup seems like a good idea:

kubectl create job --from=cronjob/hippo-repo1-full backup-post-import

May 04 '23 23:05 bendilley

Just move on to https://cloudnative-pg.io/

May 05 '23 05:05 laurivosandi

@bendilley

I also tried this but it did not work for me. PGO quickly noticed the change and restored the original state with archive_mode='on'. Maybe this might be deferred with the new "pause" flag of PGO 5.2.

Anyway, it is not useful to trick this outside of PGO or Kubernetes.

May 05 '23 07:05 mzwettler2

@mzwettler2 this was with PGO 5.3.0 but I didn't explicitly pause anything.

I do think it's surprisingly difficult to initialise a cluster from a non-PGO datasource. I've even tried restoring from a pgbackrest backup in S3, but because the original source wasn't PGO-managed I still couldn't get it to work as a dataSource. pg_restore does work, but a large dataset will fill the disk with transaction logs way faster than they can be streamed by pgbackrest, at least to S3.

@laurivosandi thanks for the link - I'm open to alternatives!

May 05 '23 10:05 bendilley

Now that I think about it, I did also merge the below patch of the patroni dynamicConfiguration in an earlier experiment - it didn't appear to have any effect, but it was still in place when I manually turned off archive_mode, which might explain why I got away with it 🤷🏻

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'

May 05 '23 16:05 bendilley

@bendilley

At the top of this thread you see that "archive_mode: on" is hardcoded within PGO. It was probably just a coincidence for you.

May 08 '23 07:05 mzwettler2

Perhaps I should ask this in a different thread, but how are you supposed to initialise a PGO database from a non-PGO datasource??

May 09 '23 09:05 bendilley

@bendilley

Yes, somewhat off topic here. Anyway, I would go with pg_dump/pg_restore. Seems to be the cleanest way for me.

May 09 '23 10:05 mzwettler2

pg_restore works fine for smaller datasets, but migrating a large database will fill up the disk with WAL faster than it can be backed-up, at least that's my experience.

May 09 '23 10:05 bendilley

@bendilley

Yep. The only "stable" dirty hack I found to prevent for a filesystem full error has been soft-linking a subdirectory of the "/pgbackrest/repo1/archive/db" on the repo pod to "/dev/null". PostgreSQL generates archives and saves them to trash. PGO does not recognize nor undo this filesystem "change".

May 09 '23 10:05 mzwettler2

Any news regarding this issue?

Sep 05 '23 07:09 johannes-engler-mw

@benjaminjb

We are hopeing and praying for this feature. When will it come?

Jun 12 '24 09:06 mzwettler2

Further to my earlier observation, I've found this combination of steps to be consistently effective:

kubectl patch postgrescluster hippo --type merge --patch-file patch-patroni-archive-mode.yml
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart
exit

where patch-patroni-archive-mode.yml is

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'

I've only ever done it while pg_restoreing a dump file. Presumably the setting wouldn't persist if the container was restarted.

After the restore is complete, I just reverse the changes and kick-off a full backup:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'on'
pg_ctl restart
exit

kubectl patch postgrescluster hippo --type json -p '[{"op": "remove", "path": "/spec/patroni/dynamicConfiguration/postgresql/parameters/archive_mode"}]'
kubectl create job --from=cronjob/hippo-repo1-full backup-post-import

Jun 12 '24 13:06 bendilley

Just dropping a quick update on this thread to note that we are currently taking a look at some upcoming plans and changes to CPK to allow allow additional flexibility around disabling archiving (as requested in this issue).

Therefore, I simply wanted to note that this issue is definitely on our radar, and we will be in touch with additional information, details, etc. once available.

Jul 09 '24 17:07 andrewlecuyer