postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Is there any way to disable archiving?

Open mzwettler2 opened this issue 3 years ago • 23 comments

I found PGO making archiving mandatory:

        // Send WAL files to all configured repositories when not in recovery.
        // - https://pgbackrest.org/user-guide.html#quickstart/configure-archiving
        // - https://pgbackrest.org/command.html#command-archive-push
        // - https://www.postgresql.org/docs/current/runtime-config-wal.html
        archive := `pgbackrest --stanza=` + DefaultStanzaName + ` archive-push "%p"`
        outParameters.Mandatory.Add("archive_mode", "on")
        outParameters.Mandatory.Add("archive_command", archive)

This also requires having a repo retention and at least on scheduled backup to get automatic retention management working.

There might be cases where I do not want archiving, eg. loading a large amount of data. There might be even cases where I do not want archiving nor backups, eg. some kind of temporary test systems.

QUESTION: Is there any way to disable archiving?

Environment

Please provide the following details:

  • Platform: Anthos
  • Platform Version: 1.10 (afaik)
  • PGO Image Tag: postgres-operator:ubi8-5.1.1-0
  • Postgres Version: 14
  • Storage: vm ware csi
  • Number of Postgres clusters: several

mzwettler2 avatar Oct 24 '22 10:10 mzwettler2

I also find it quite odd that I have to enable backups

laurivosandi avatar Nov 02 '22 14:11 laurivosandi

Any solutions found?

nfzv avatar Nov 08 '22 08:11 nfzv

I mangled the CRD definition and set minIndex: 0 in the relevant section :D

laurivosandi avatar Nov 08 '22 08:11 laurivosandi

@laurivosandi

What did you do exactly? Hacking the PGO sources?

mzwettler2 avatar Nov 08 '22 10:11 mzwettler2

@cbandy

Maybe this would be an enhancement request?

mzwettler2 avatar Nov 14 '22 14:11 mzwettler2

JFYI: After hacking the CRD-s looks like WAL keeps growing forever :D

So yeah authors should address this issue themselves

laurivosandi avatar Dec 15 '22 13:12 laurivosandi

Just dropping a line here to note that we have a ticket in our backlog for disabling backups/archiving (for dev envs only, please!)

(This is a feature that has been requested in issues a few times, see also https://github.com/CrunchyData/postgres-operator/issues/2531 https://github.com/CrunchyData/postgres-operator/issues/3152 I'm adding these links to help consolidate this issue in case I close any of these tickets.)

benjaminjb avatar Jan 03 '23 19:01 benjaminjb

What is the timeline for this feature? Do you have an estimate on when this will be available?

Pluggi avatar Jan 25 '23 11:01 Pluggi

Any update on this?

sigwinch28 avatar Mar 09 '23 13:03 sigwinch28

I've found a way to do this temporarily:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart

It's very useful when my initial datasource is a big pg_dump file.

After turning archive_mode back on, kicking-off a full backup seems like a good idea:

kubectl create job --from=cronjob/hippo-repo1-full backup-post-import

bendilley avatar May 04 '23 23:05 bendilley

Just move on to https://cloudnative-pg.io/

laurivosandi avatar May 05 '23 05:05 laurivosandi

@bendilley

I also tried this but it did not work for me. PGO quickly noticed the change and restored the original state with archive_mode='on'. Maybe this might be deferred with the new "pause" flag of PGO 5.2.

Anyway, it is not useful to trick this outside of PGO or Kubernetes.

mzwettler2 avatar May 05 '23 07:05 mzwettler2

@mzwettler2 this was with PGO 5.3.0 but I didn't explicitly pause anything.

I do think it's surprisingly difficult to initialise a cluster from a non-PGO datasource. I've even tried restoring from a pgbackrest backup in S3, but because the original source wasn't PGO-managed I still couldn't get it to work as a dataSource. pg_restore does work, but a large dataset will fill the disk with transaction logs way faster than they can be streamed by pgbackrest, at least to S3.

@laurivosandi thanks for the link - I'm open to alternatives!

bendilley avatar May 05 '23 10:05 bendilley

Now that I think about it, I did also merge the below patch of the patroni dynamicConfiguration in an earlier experiment - it didn't appear to have any effect, but it was still in place when I manually turned off archive_mode, which might explain why I got away with it 🤷🏻

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'

bendilley avatar May 05 '23 16:05 bendilley

@bendilley

At the top of this thread you see that "archive_mode: on" is hardcoded within PGO. It was probably just a coincidence for you.

mzwettler2 avatar May 08 '23 07:05 mzwettler2

Perhaps I should ask this in a different thread, but how are you supposed to initialise a PGO database from a non-PGO datasource??

bendilley avatar May 09 '23 09:05 bendilley

@bendilley

Yes, somewhat off topic here. Anyway, I would go with pg_dump/pg_restore. Seems to be the cleanest way for me.

mzwettler2 avatar May 09 '23 10:05 mzwettler2

pg_restore works fine for smaller datasets, but migrating a large database will fill up the disk with WAL faster than it can be backed-up, at least that's my experience.

bendilley avatar May 09 '23 10:05 bendilley

@bendilley

Yep. The only "stable" dirty hack I found to prevent for a filesystem full error has been soft-linking a subdirectory of the "/pgbackrest/repo1/archive/db" on the repo pod to "/dev/null". PostgreSQL generates archives and saves them to trash. PGO does not recognize nor undo this filesystem "change".

mzwettler2 avatar May 09 '23 10:05 mzwettler2

Any news regarding this issue?

johannes-engler-mw avatar Sep 05 '23 07:09 johannes-engler-mw

@benjaminjb

We are hopeing and praying for this feature. When will it come?

mzwettler2 avatar Jun 12 '24 09:06 mzwettler2

Further to my earlier observation, I've found this combination of steps to be consistently effective:

kubectl patch postgrescluster hippo --type merge --patch-file patch-patroni-archive-mode.yml
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart
exit

where patch-patroni-archive-mode.yml is

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'

I've only ever done it while pg_restoreing a dump file. Presumably the setting wouldn't persist if the container was restarted.

After the restore is complete, I just reverse the changes and kick-off a full backup:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'on'
pg_ctl restart
exit

kubectl patch postgrescluster hippo --type json -p '[{"op": "remove", "path": "/spec/patroni/dynamicConfiguration/postgresql/parameters/archive_mode"}]'
kubectl create job --from=cronjob/hippo-repo1-full backup-post-import

bendilley avatar Jun 12 '24 13:06 bendilley

Just dropping a quick update on this thread to note that we are currently taking a look at some upcoming plans and changes to CPK to allow allow additional flexibility around disabling archiving (as requested in this issue).

Therefore, I simply wanted to note that this issue is definitely on our radar, and we will be in touch with additional information, details, etc. once available.

andrewlecuyer avatar Jul 09 '24 17:07 andrewlecuyer