Is there any way to disable archiving?
I found PGO making archiving mandatory:
// Send WAL files to all configured repositories when not in recovery.
// - https://pgbackrest.org/user-guide.html#quickstart/configure-archiving
// - https://pgbackrest.org/command.html#command-archive-push
// - https://www.postgresql.org/docs/current/runtime-config-wal.html
archive := `pgbackrest --stanza=` + DefaultStanzaName + ` archive-push "%p"`
outParameters.Mandatory.Add("archive_mode", "on")
outParameters.Mandatory.Add("archive_command", archive)
This also requires having a repo retention and at least on scheduled backup to get automatic retention management working.
There might be cases where I do not want archiving, eg. loading a large amount of data. There might be even cases where I do not want archiving nor backups, eg. some kind of temporary test systems.
QUESTION: Is there any way to disable archiving?
Environment
Please provide the following details:
- Platform: Anthos
- Platform Version: 1.10 (afaik)
- PGO Image Tag: postgres-operator:ubi8-5.1.1-0
- Postgres Version: 14
- Storage: vm ware csi
- Number of Postgres clusters: several
I also find it quite odd that I have to enable backups
Any solutions found?
I mangled the CRD definition and set minIndex: 0 in the relevant section :D
@laurivosandi
What did you do exactly? Hacking the PGO sources?
@cbandy
Maybe this would be an enhancement request?
JFYI: After hacking the CRD-s looks like WAL keeps growing forever :D
So yeah authors should address this issue themselves
Just dropping a line here to note that we have a ticket in our backlog for disabling backups/archiving (for dev envs only, please!)
(This is a feature that has been requested in issues a few times, see also https://github.com/CrunchyData/postgres-operator/issues/2531 https://github.com/CrunchyData/postgres-operator/issues/3152 I'm adding these links to help consolidate this issue in case I close any of these tickets.)
What is the timeline for this feature? Do you have an estimate on when this will be available?
Any update on this?
I've found a way to do this temporarily:
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart
It's very useful when my initial datasource is a big pg_dump file.
After turning archive_mode back on, kicking-off a full backup seems like a good idea:
kubectl create job --from=cronjob/hippo-repo1-full backup-post-import
Just move on to https://cloudnative-pg.io/
@bendilley
I also tried this but it did not work for me. PGO quickly noticed the change and restored the original state with archive_mode='on'. Maybe this might be deferred with the new "pause" flag of PGO 5.2.
Anyway, it is not useful to trick this outside of PGO or Kubernetes.
@mzwettler2 this was with PGO 5.3.0 but I didn't explicitly pause anything.
I do think it's surprisingly difficult to initialise a cluster from a non-PGO datasource. I've even tried restoring from a pgbackrest backup in S3, but because the original source wasn't PGO-managed I still couldn't get it to work as a dataSource. pg_restore does work, but a large dataset will fill the disk with transaction logs way faster than they can be streamed by pgbackrest, at least to S3.
@laurivosandi thanks for the link - I'm open to alternatives!
Now that I think about it, I did also merge the below patch of the patroni dynamicConfiguration in an earlier experiment - it didn't appear to have any effect, but it was still in place when I manually turned off archive_mode, which might explain why I got away with it 🤷🏻
spec:
patroni:
dynamicConfiguration:
postgresql:
parameters:
archive_mode: 'off'
@bendilley
At the top of this thread you see that "archive_mode: on" is hardcoded within PGO. It was probably just a coincidence for you.
Perhaps I should ask this in a different thread, but how are you supposed to initialise a PGO database from a non-PGO datasource??
@bendilley
Yes, somewhat off topic here. Anyway, I would go with pg_dump/pg_restore. Seems to be the cleanest way for me.
pg_restore works fine for smaller datasets, but migrating a large database will fill up the disk with WAL faster than it can be backed-up, at least that's my experience.
@bendilley
Yep. The only "stable" dirty hack I found to prevent for a filesystem full error has been soft-linking a subdirectory of the "/pgbackrest/repo1/archive/db" on the repo pod to "/dev/null". PostgreSQL generates archives and saves them to trash. PGO does not recognize nor undo this filesystem "change".
Any news regarding this issue?
@benjaminjb
We are hopeing and praying for this feature. When will it come?
Further to my earlier observation, I've found this combination of steps to be consistently effective:
kubectl patch postgrescluster hippo --type merge --patch-file patch-patroni-archive-mode.yml
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart
exit
where patch-patroni-archive-mode.yml is
spec:
patroni:
dynamicConfiguration:
postgresql:
parameters:
archive_mode: 'off'
I've only ever done it while pg_restoreing a dump file. Presumably the setting wouldn't persist if the container was restarted.
After the restore is complete, I just reverse the changes and kick-off a full backup:
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'on'
pg_ctl restart
exit
kubectl patch postgrescluster hippo --type json -p '[{"op": "remove", "path": "/spec/patroni/dynamicConfiguration/postgresql/parameters/archive_mode"}]'
kubectl create job --from=cronjob/hippo-repo1-full backup-post-import
Just dropping a quick update on this thread to note that we are currently taking a look at some upcoming plans and changes to CPK to allow allow additional flexibility around disabling archiving (as requested in this issue).
Therefore, I simply wanted to note that this issue is definitely on our radar, and we will be in touch with additional information, details, etc. once available.