postgres-operator Cluster read-only

To get the context: I have PGO v5.0.3 installed on a GCP kubernetes cluster with 3 nodes (1 primary & 2 replicas). Today we got an issue where one of the VM had its CPU up to 100% and the replica hosted on that VM broke and keep crashing with these logs:

2022-10-12 11:36:35.119 CEST2022-10-12 09:36:35,118 INFO: doing crash recovery in a single user mode
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35,683 ERROR: Crash recovery finished with code=-6
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35,684 INFO: stdout=
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35,684 INFO: stderr=2022-10-12 09:36:35.145 GMT [10595] LOG: database system was interrupted; last known up at 2022-10-12 06:59:03 GMT
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35.681 GMT [10595] LOG: could not read from log segment 00000028000004B50000006A, offset 0: read 0 of 8192
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35.681 GMT [10595] LOG: invalid primary checkpoint record
Error
2022-10-12 11:36:35.684 CEST2022-10-12 09:36:35.681 GMT [10595] PANIC: could not locate a valid checkpoint record

And since then, our cluster is read-only. Every SELECT are working but each INSERT or UPDATE we try through pgadmin returns a timeout.

The replica that is failing was initially the master before the crash occurs, that may have its importance. Also, Patroni is running in synchrounous mode with this config:

patroni:
    dynamicConfiguration:
      postgresql:
        pg_hba:
          - "hostnossl all all all md5"
        parameters:
          synchronous_commit: "on"
          synchronous_standby_names: "*"

As stated in the patroni's documentation:

If followers become inaccessible from the leader, the leader effectively becomes read-only.

So it seems that is what is going on ? But it also states:

When using PostgreSQL synchronous replication, use at least three Postgres data nodes to ensure write availability if one host fails.

And their is 3 nodes with 2 of the 3 that are healthy, see patronictl list:

+----------------------------------------+-------------------------------------------------------------------+---------+--------------+----+-----------+
| Member                                 | Host                                                              | Role    | State        | TL | Lag in MB |
+ Cluster: postgres-adisoft-prod-ha (7093121775919198297) ---------------------------------------------------+---------+--------------+----+-----------+
| postgres-adisoft-prod-instance1-49s5-0 | postgres-adisoft-prod-instance1-49s5-0.postgres-adisoft-prod-pods | Replica | running      | 39 |      1955 |
| postgres-adisoft-prod-instance1-v52w-0 | postgres-adisoft-prod-instance1-v52w-0.postgres-adisoft-prod-pods | Replica | start failed |    |   unknown |
| postgres-adisoft-prod-instance1-w687-0 | postgres-adisoft-prod-instance1-w687-0.postgres-adisoft-prod-pods | Leader  | running      | 40 |           |
+----------------------------------------+-------------------------------------------------------------------+---------+--------------+----+-----------+

I did a patronictl reinit and the replica recovered and the cluster became writtable but still, my question is, why did it became read-only when 2 of the 3 nodes were still running ? I would like to avoid this kind of situation in the future.

Thank you!

Oct 12 '22 10:10 Martin-Hogge

We had it once again today. One of the VM hosting a replica has been shutdown by GKE auto-scaler and until this replica went up any write would fail. The only thing that changed is that I didn't need to do a patronictl reinit as the pod recovered gracefully.

Oct 13 '22 09:10 Martin-Hogge

@Martin-Hogge just wanted to circle back to see if you're still having any issues with synchronous replication.

Looking at the logs you provided, it appears as though neither replica was heathy. More specifically, while one replica was clearly crashing, the other was lagging behind the primary (while also showing a different timelines, etc.). So my hunch here is that Patroni was not able to leverage the additional replica for sync replication because it was unhealthy.

Additionally, I'll also note that the version of Patroni in the latest version of CPK (Patroni v3.1.2 as of CPK v5.5.1) includes a lot of fixes and updates since this issue was submitted. I therefore also recommend testing sync replication with the latest/greatest CPK release if you haven't done so already.

And considering the age of this issue and that it pertains to a much older (unmaintained) version of CPK, I am going to proceed with closing. However, feel free to update if you are still having trouble, or you can also feel free to continue the conversation via the PGO project community discord server.

Mar 06 '24 15:03 andrewlecuyer