How to recover from a current/assigned state combination without a possible state transition path?
Consider the following state we encountered today during testing:
Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
--------+-------+--------------+------------+--------------+---------------------+--------------------
db002-2 | 4 | db002-2:5432 | 0/6709DFE8 | read-write | wait_primary | wait_primary
db002-1 | 6 | db002-1:5432 | 0/6709C120 | read-write ! | wait_primary | demoted
2021-07-13 07:47:17.440282+02 | 0/6 | primary | primary | New state is reported by node 6 "db002-1" (db002-1:5432): "primary"
2021-07-13 11:18:27.359978+02 | 0/6 | primary | draining | Setting goal state of node 6 "db002-1" (db002-1:5432) to draining and node 4 "db002-2" (db002-2:5432) to prepare_promotion after a user-initiated failover.
2021-07-13 11:18:27.359978+02 | 0/4 | secondary | prepare_promotion | Setting goal state of node 6 "db002-1" (db002-1:5432) to draining and node 4 "db002-2" (db002-2:5432) to prepare_promotion after a user-initiated failover.
2021-07-13 11:18:28.320592+02 | 0/6 | primary | wait_primary | Setting goal state of node 6 "db002-1" (db002-1:5432) to wait_primary because none of the standby nodes in the quorum are healthy at the moment.
2021-07-13 11:18:28.503273+02 | 0/6 | wait_primary | wait_primary | New state is reported by node 6 "db002-1" (db002-1:5432): "wait_primary"
2021-07-13 11:18:28.768262+02 | 0/4 | prepare_promotion | prepare_promotion | New state is reported by node 4 "db002-2" (db002-2:5432): "prepare_promotion"
2021-07-13 11:18:28.768262+02 | 0/4 | prepare_promotion | stop_replication | Setting goal state of node 6 "db002-1" (db002-1:5432) to demote_timeout and node 4 "db002-2" (db002-2:5432) to stop_replication after node 4 "db002-2" (db002-2:5432) converged to prepare_promotion.
2021-07-13 11:18:28.768262+02 | 0/6 | wait_primary | demote_timeout | Setting goal state of node 6 "db002-1" (db002-1:5432) to demote_timeout and node 4 "db002-2" (db002-2:5432) to stop_replication after node 4 "db002-2" (db002-2:5432) converged to prepare_promotion.
2021-07-13 11:18:31.832889+02 | 0/4 | stop_replication | stop_replication | New state is reported by node 4 "db002-2" (db002-2:5432): "stop_replication"
2021-07-13 11:23:34.388164+02 | 0/6 | wait_primary | demote_timeout | Node node 6 "db002-1" (db002-1:5432) is marked as unhealthy by the monitor
2021-07-13 11:31:18.036089+02 | 0/4 | stop_replication | wait_primary | Setting goal state of node 4 "db002-2" (db002-2:5432) to wait_primary and node 6 "db002-1" (db002-1:5432) to demoted after the demote timeout expired.
2021-07-13 11:31:18.036089+02 | 0/6 | wait_primary | demoted | Setting goal state of node 4 "db002-2" (db002-2:5432) to wait_primary and node 6 "db002-1" (db002-1:5432) to demoted after the demote timeout expired.
2021-07-13 11:31:19.049409+02 | 0/4 | wait_primary | wait_primary | New state is reported by node 4 "db002-2" (db002-2:5432): "wait_primary"
2021-07-13 11:33:42.51511+02 | 0/4 | wait_primary | wait_primary | Node node 4 "db002-2" (db002-2:5432) is marked as unhealthy by the monitor
2021-07-13 11:35:48.654584+02 | 0/4 | wait_primary | wait_primary | Node node 4 "db002-2" (db002-2:5432) is marked as healthy by the monitor
Before reaching this state node 6 was primary and node 4 was secondary. We then issued a pg_autoctl perform failover which stopped at
2021-07-13 11:18:31.832889+02 | 0/4 | stop_replication | stop_replication | New state is reported by node 4 "db002-2" (db002-2:5432): "stop_replication"
for several minutes. Since node 4 stopped replication we figured that node 6 now needs to finally demote but it did not happen. So we restarted node 6 to see if it solves the issue but we ultimately ended up in the above final state. In this state node 6 did not continue anymore because it does not know how to transition from wait_primary to demoted.
We couldn't figure out how to recover from this state. What needs to be done if there is no transition path between current state and assigned state? Do we really have to drop node 6, deleting all data on node 6 and then create a new secondary from it?
We used some pg_autoctl developer/debug commands (standby init, fsm assign/step) in order to convince node 6 to become a secondary but nothing really helped. At the end we fully reset node 6 and its data and registered it again as a new secondary node.
Hi @jnehlmeier, thanks for another bug report. This one is troubling, having two nodes get both to wait_primary should never happen. Would you have kept pg_autoctl logs from the nodes from that test run?
Other than that, when all things break, the way forward is to drop a node and then create it again. In v1.6.1 it is possible at times to do that without having to delete the PGDATA, and the node can re-join as a secondary from the existing copy of the data set. Sometimes this requires an extra call to pg_rewind before hand, and this is not automated in pg_auto_failover yet. Sometimes the needed WAL files might be missing, too, and then you would have to delete the PGDATA and start afresh anyway.
Hi @jnehlmeier, thanks for another bug report. This one is troubling, having two nodes get both to wait_primary should never happen. Would you have kept pg_autoctl logs from the nodes from that test run?
Below are logs collected during this test. However the three hosts that run the containers did not had synchronized times so logging timestamps are not easily comparable. I have added the diff in seconds to the headline of each log section below.
As you can see in the logs the monitor did not proceed (and during that pause we restarted the nodes multiple times)
"11:18:31 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): stop_replication ➜ stop_replication"
"11:30:53 1 INFO pg_autoctl received signal SIGTERM, terminating"
because the monitor has put node 6 already in an unreachable state
MONITOR:
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to draining and node 4 ""db002-2"" (db002-2:5432) to prepare_promotion after a user-initiated failover."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): primary ➜ draining"
"11:18:28 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): secondary ➜ prepare_promotion"
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to wait_primary because none of the standby nodes in the quorum are healthy at the moment."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): primary ➜ wait_primary"
"11:18:28 14 INFO New state is reported by node 6 ""db002-1"" (db002-1:5432): ""wait_primary"""
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ wait_primary"
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to demote_timeout and node 4 ""db002-2"" (db002-2:5432) to stop_replication after node 4 ""db002-2"" (db002-2:5432) converged to prepare_promotion."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ demote_timeout"
NODE 6 (~+11s timestamp diff):
"11:18:39 30 INFO Monitor assigned new state ""wait_primary"""
"11:18:39 30 INFO FSM transition from ""primary"" to ""wait_primary"": Secondary became unhealthy"
"11:18:39 30 INFO Disabling synchronous replication"
"11:18:39 30 INFO Reloading Postgres configuration and HBA rules"
"11:18:39 30 INFO Transition complete: current state is now ""wait_primary"""
"11:18:39 30 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ prepare_promotion"
"11:18:39 30 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ stop_replication"
"11:18:39 30 INFO New state for this node (node 6, ""db002-1"") (db002-1:5432): wait_primary ➜ demote_timeout"
"11:18:39 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:39 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:39 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
Full logs
Monitor logs:
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to draining and node 4 ""db002-2"" (db002-2:5432) to prepare_promotion after a user-initiated failover."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): primary ➜ draining"
"11:18:28 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): secondary ➜ prepare_promotion"
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to wait_primary because none of the standby nodes in the quorum are healthy at the moment."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): primary ➜ wait_primary"
"11:18:28 14 INFO New state is reported by node 6 ""db002-1"" (db002-1:5432): ""wait_primary"""
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ wait_primary"
"11:18:28 14 INFO Setting goal state of node 6 ""db002-1"" (db002-1:5432) to demote_timeout and node 4 ""db002-2"" (db002-2:5432) to stop_replication after node 4 ""db002-2"" (db002-2:5432) converged to prepare_promotion."
"11:18:28 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ demote_timeout"
"11:18:28 14 INFO New state is reported by node 4 ""db002-2"" (db002-2:5432): ""prepare_promotion"""
"11:18:28 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ prepare_promotion"
"11:18:28 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ stop_replication"
"11:18:31 14 INFO New state is reported by node 4 ""db002-2"" (db002-2:5432): ""stop_replication"""
"11:18:31 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): stop_replication ➜ stop_replication"
"11:30:53 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:30:53 14 INFO Listener service received signal SIGTERM, terminating"
"11:30:53 13 INFO Postgres controller service received signal SIGTERM, terminating"
"11:30:53 13 INFO Stopping pg_autoctl postgres service"
"11:30:53 13 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/monitor --wait stop --mode fast"
"11:30:53 1 INFO Waiting for subprocesses to terminate."
"11:30:53 1 ERROR Unknown subprocess died with pid 24"
"11:30:53 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass ... "
"Monitor already created."
"Generating pg_hba.conf using allowed subnet 10.41.1.0/24 ..."
"Starting pg_autoctl..."
"11:30:55 1 INFO Started pg_autoctl postgres service with pid 13"
"11:30:55 13 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/monitor -v"
"11:30:55 1 INFO Started pg_autoctl listener service with pid 14"
"11:30:55 14 INFO /usr/bin/pg_autoctl do service listener --pgdata /var/lib/postgresql/monitor -v"
"11:30:55 14 INFO Managing the monitor at postgres://autoctl_node@db002-monitor:5432/pg_auto_failover?sslmode=require"
"11:30:55 14 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/monitor/pg_autoctl.cfg"""
"11:30:56 23 INFO /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/monitor -p 5432 -h *"
"11:30:56 14 INFO The version of extension ""pgautofailover"" is ""1.5"" on the monitor"
"11:30:56 14 INFO Contacting the monitor to LISTEN to its events."
"11:30:56 13 INFO Postgres is now serving PGDATA ""/var/lib/postgresql/monitor"" on port 5432 with pid 23"
"11:31:18 14 INFO Setting goal state of node 4 ""db002-2"" (db002-2:5432) to wait_primary and node 6 ""db002-1"" (db002-1:5432) to demoted after the demote timeout expired."
"11:31:18 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): stop_replication ➜ wait_primary"
"11:31:18 14 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ demoted"
"11:31:19 14 INFO New state is reported by node 4 ""db002-2"" (db002-2:5432): ""wait_primary"""
"11:31:19 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): wait_primary ➜ wait_primary"
"13:10:25 14 INFO Node node 4 ""db002-2"" (db002-2:5432) is marked as unhealthy by the monitor"
"13:10:25 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): wait_primary ➜ wait_primary"
"13:10:25 14 INFO Node node 4 ""db002-2"" (db002-2:5432) is marked as healthy by the monitor"
"13:10:25 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): wait_primary ➜ wait_primary"
"13:10:25 14 INFO Removing node 6 ""db002-1"" (db002-1:5432) from formation ""default"" and group 0"
"13:10:25 14 INFO Setting number_sync_standbys to 0 for formation ""default"" now that we have 0 standby nodes set with replication-quorum."
"13:10:25 14 INFO Setting goal state of node 4 ""db002-2"" (db002-2:5432) to single as there is no other node."
"13:10:25 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): wait_primary ➜ single"
"13:10:25 14 INFO Setting goal state of node 4 ""db002-2"" (db002-2:5432) to single as there is no other node."
"13:10:25 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): wait_primary ➜ single"
"13:10:26 14 INFO New state is reported by node 4 ""db002-2"" (db002-2:5432): ""single"""
"13:10:26 14 INFO New state for node 4 ""db002-2"" (db002-2:5432): single ➜ single"
"13:48:17 1 INFO pg_autoctl received signal SIGTERM, terminating"
"13:48:17 13 INFO Postgres controller service received signal SIGTERM, terminating"
"13:48:17 14 INFO Listener service received signal SIGTERM, terminating"
"13:48:17 13 INFO Stopping pg_autoctl postgres service"
"13:48:17 13 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/monitor --wait stop --mode fast"
"13:48:17 1 ERROR Unknown subprocess died with pid 24"
"13:48:18 1 INFO Stop pg_autoctl"
Node 6 (was primary before pg_autoctl perform failover, timestamps are ~+11s to monitor)
"11:18:39 30 INFO Monitor assigned new state ""wait_primary"""
"11:18:39 30 INFO FSM transition from ""primary"" to ""wait_primary"": Secondary became unhealthy"
"11:18:39 30 INFO Disabling synchronous replication"
"11:18:39 30 INFO Reloading Postgres configuration and HBA rules"
"11:18:39 30 INFO Transition complete: current state is now ""wait_primary"""
"11:18:39 30 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ prepare_promotion"
"11:18:39 30 INFO New state for node 4 ""db002-2"" (db002-2:5432): prepare_promotion ➜ stop_replication"
"11:18:39 30 INFO New state for this node (node 6, ""db002-1"") (db002-1:5432): wait_primary ➜ demote_timeout"
"11:18:39 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:39 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:39 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:18:40 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:40 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:40 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:18:41 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:41 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:41 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:18:42 30 INFO New state for node 4 ""db002-2"" (db002-2:5432): stop_replication ➜ stop_replication"
"11:18:42 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:42 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:42 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:18:43 30 INFO Monitor assigned new state ""demote_timeout"""
"11:18:43 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:18:43 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
... LOTS OF RETRIES ...
"11:23:33 30 INFO Monitor assigned new state ""demote_timeout"""
"11:23:33 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:23:33 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:23:34 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:23:34 29 INFO Postgres controller service received signal SIGTERM, terminating"
"11:23:34 29 INFO Stopping pg_autoctl postgres service"
"11:23:34 29 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/node --wait stop --mode fast"
"11:23:34 30 INFO Monitor assigned new state ""demote_timeout"""
"11:23:34 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:23:34 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:23:34 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:23:35 1 INFO Started pg_autoctl postgres service with pid 29"
"11:23:35 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:23:35 1 INFO Started pg_autoctl node-active service with pid 30"
"11:23:35 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:23:35 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:23:35 30 INFO pg_autoctl service is running, current state is ""wait_primary"""
"11:23:35 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:23:36 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:23:36 30 INFO Checking for HBA rules for node 4 ""db002-2"" (db002-2:5432)"
"11:23:36 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-2 trust"
"11:23:36 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-2 trust"
"11:23:36 30 INFO Monitor assigned new state ""demote_timeout"""
"11:23:36 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:23:36 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:23:37 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:23:37 30 INFO Monitor assigned new state ""demote_timeout"""
"11:23:37 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:23:37 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
... LOTS OF RETRIES ...
"11:29:56 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:29:56 30 INFO Monitor assigned new state ""demote_timeout"""
"11:29:56 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:29:56 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:29:57 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:29:57 30 INFO Monitor assigned new state ""demote_timeout"""
"11:29:57 30 FATAL pg_autoctl does not know how to reach state ""demote_timeout"" from ""wait_primary"""
"11:29:57 30 ERROR Failed to transition to state ""demote_timeout"", retrying... "
"11:29:58 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:29:58 29 INFO Postgres controller service received signal SIGTERM, terminating"
"11:29:58 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:32:25 1 INFO Started pg_autoctl postgres service with pid 29"
"11:32:25 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:32:25 1 INFO Started pg_autoctl node-active service with pid 30"
"11:32:25 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:32:25 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:32:25 30 INFO pg_autoctl service is running, current state is ""wait_primary"""
"11:32:25 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:32:25 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:32:25 30 INFO Checking for HBA rules for node 4 ""db002-2"" (db002-2:5432)"
"11:32:25 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-2 trust"
"11:32:25 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-2 trust"
"11:32:25 30 INFO Monitor assigned new state ""demoted"""
"11:32:25 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:32:25 30 ERROR Failed to transition to state ""demoted"", retrying... "
"11:32:27 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:32:27 30 INFO Monitor assigned new state ""demoted"""
"11:32:27 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:32:27 30 ERROR Failed to transition to state ""demoted"", retrying... "
... LOTS OF RETRIES ....
"11:33:32 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:33:33 30 INFO Monitor assigned new state ""demoted"""
"11:33:33 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:33:33 30 ERROR Failed to transition to state ""demoted"", retrying... "
"11:33:34 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:33:34 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:33:34 29 INFO Postgres controller service received signal SIGTERM, terminating"
"11:33:34 30 INFO Monitor assigned new state ""demoted"""
"11:33:34 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:33:34 30 ERROR Failed to transition to state ""demoted"", retrying... "
"11:33:34 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ....
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:33:52 1 INFO Started pg_autoctl postgres service with pid 29"
"11:33:52 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:33:52 1 INFO Started pg_autoctl node-active service with pid 30"
"11:33:52 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:33:52 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:33:52 30 INFO pg_autoctl service is running, current state is ""wait_primary"""
"11:33:52 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:33:53 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:33:53 30 INFO Checking for HBA rules for node 4 ""db002-2"" (db002-2:5432)"
"11:33:53 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-2 trust"
"11:33:53 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-2 trust"
"11:33:53 30 INFO Monitor assigned new state ""demoted"""
"11:33:53 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:33:53 30 ERROR Failed to transition to state ""demoted"", retrying... "
"11:33:54 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:33:54 30 INFO Monitor assigned new state ""demoted"""
"11:33:54 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:33:54 30 ERROR Failed to transition to state ""demoted"", retrying... "
"11:33:55 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:33:55 30 INFO Monitor assigned new state ""demoted"""
"11:33:55 30 FATAL pg_autoctl does not know how to reach state ""demoted"" from ""wait_primary"""
"11:33:55 30 ERROR Failed to transition to state ""demoted"", retrying... "
... ENDLESS RETRIES UNTIL WE STOPPED THE TEST ...
Node 4 (was secondary before pg_autoctl perform failover, timestamps are ~+13s to monitor)
"11:18:41 30 ERROR Monitor ERROR: deadlock detected"
"11:18:41 30 ERROR Monitor DETAIL: Process 616799 waits for ExclusiveLock on advisory lock [16385,822708183,0,11]; blocked by process 616803."
"11:18:41 30 ERROR Monitor Process 616803 waits for ShareLock on transaction 341083; blocked by process 616799."
"11:18:41 30 ERROR Monitor HINT: See server log for query details."
"11:18:41 30 ERROR Failed to get node state for node 4 in group 0 of formation ""default"" with initial state ""secondary"", replication state """", and current lsn ""0/6709C120"", see previous lines for details"
"11:18:41 30 ERROR Failed to get the goal state from the monitor"
"11:18:41 30 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ wait_primary"
"11:18:41 30 INFO Successfully got the goal state from the monitor"
"11:18:41 30 INFO FSM transition from ""secondary"" to ""prepare_promotion"": Stop traffic to primary, wait for it to finish draining."
"11:18:41 30 INFO Monitor assigned new state ""prepare_promotion"""
"11:18:41 30 INFO Transition complete: current state is now ""prepare_promotion"""
"11:18:42 30 INFO FSM transition from ""prepare_promotion"" to ""stop_replication"": Prevent against split-brain situations."
"11:18:42 30 INFO Setting default_transaction_read_only to on"
"11:18:42 30 INFO Monitor assigned new state ""stop_replication"""
"11:18:42 30 INFO Prevent writes to the promoted standby while the primary is not demoted yet, by making the service incompatible with target_session_attrs = read-write"
"11:18:42 30 INFO Reloading Postgres configuration and HBA rules"
"11:18:42 30 INFO Promoting postgres"
"11:18:42 30 INFO Waiting for postgres to promote"
"11:18:44 30 INFO Cleaning-up Postgres replication settings"
"11:18:44 30 INFO Disabling synchronous replication"
"11:18:44 30 INFO Reloading Postgres configuration and HBA rules"
"11:18:45 30 INFO Transition complete: current state is now ""stop_replication"""
"11:18:45 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): prepare_promotion ➜ prepare_promotion"
"11:18:45 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): prepare_promotion ➜ stop_replication"
"11:18:45 30 INFO New state for node 6 ""db002-1"" (db002-1:5432): wait_primary ➜ demote_timeout"
"11:18:45 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): stop_replication ➜ stop_replication"
"11:20:57 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:20:57 29 INFO Postgres controller service received signal SIGTERM, terminating"
"11:20:57 29 INFO Stopping pg_autoctl postgres service"
"11:20:57 29 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/node --wait stop --mode fast"
"11:20:57 1 INFO Waiting for subprocesses to terminate."
"11:20:59 1 ERROR Unknown subprocess died with pid 105884"
"11:20:59 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:21:02 1 INFO Started pg_autoctl postgres service with pid 29"
"11:21:02 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:21:02 1 INFO Started pg_autoctl node-active service with pid 30"
"11:21:02 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:21:02 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:21:02 30 INFO Contents of ""/var/lib/postgresql/node/postgresql-auto-failover.conf"" have changed, overwriting"
"11:21:02 30 INFO pg_autoctl service is running, current state is ""stop_replication"""
"11:21:02 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:21:02 30 INFO Checking for HBA rules for node 6 ""db002-1"" (db002-1:5432)"
"11:21:02 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-1 trust"
"11:21:02 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-1 trust"
"11:21:02 38 INFO /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/node -p 5432 -h *"
"11:21:03 29 INFO Postgres is now serving PGDATA ""/var/lib/postgresql/node"" on port 5432 with pid 38"
"11:21:03 30 WARN PostgreSQL was not running, restarted with pid 38"
"11:30:06 1 INFO pg_autoctl received signal SIGTERM, terminating"
"11:30:06 29 INFO Postgres controller service received signal SIGTERM, terminating"
"11:30:06 29 INFO Stopping pg_autoctl postgres service"
"11:30:06 29 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/node --wait stop --mode fast"
"11:30:06 1 INFO Waiting for subprocesses to terminate."
"11:30:06 1 ERROR Unknown subprocess died with pid 39"
"11:30:07 1 INFO Stop pg_autoctl"
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:31:31 1 INFO Started pg_autoctl postgres service with pid 29"
"11:31:31 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:31:31 1 INFO Started pg_autoctl node-active service with pid 30"
"11:31:31 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:31:31 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:31:31 30 INFO pg_autoctl service is running, current state is ""stop_replication"""
"11:31:31 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:31:31 30 INFO Checking for HBA rules for node 6 ""db002-1"" (db002-1:5432)"
"11:31:31 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-1 trust"
"11:31:31 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-1 trust"
"11:31:31 30 INFO Monitor assigned new state ""wait_primary"""
"11:31:31 38 INFO /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/node -p 5432 -h *"
"11:31:31 29 INFO Postgres is now serving PGDATA ""/var/lib/postgresql/node"" on port 5432 with pid 38"
"11:31:32 30 WARN PostgreSQL was not running, restarted with pid 38"
"11:31:32 30 INFO FSM transition from ""stop_replication"" to ""wait_primary"": Confirmed promotion with the monitor"
"11:31:32 30 INFO Setting default_transaction_read_only to off"
"11:31:32 30 INFO Reloading Postgres configuration and HBA rules"
"11:31:32 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:31:32 30 INFO Checking for HBA rules for node 6 ""db002-1"" (db002-1:5432)"
"11:31:32 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-1 trust"
"11:31:32 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-1 trust"
"11:31:32 30 INFO Transition complete: current state is now ""wait_primary"""
... SOME SHUTDOWN LOGS MISSING ...
... USER-INITIATED CONTAINER RESTART ...
"Given log level not defined or unknown"
"Using pg_auto_failover log level INFO ..."
"Generating .pgpass file ... "
"pg_auto_failover node already created."
"Setting replication.password ..."
"Node configuration done."
"Starting pg_autoctl..."
"11:35:59 1 INFO Started pg_autoctl postgres service with pid 29"
"11:35:59 29 INFO /usr/bin/pg_autoctl do service postgres --pgdata /var/lib/postgresql/node -v"
"11:35:59 1 INFO Started pg_autoctl node-active service with pid 30"
"11:35:59 30 INFO /usr/bin/pg_autoctl do service node-active --pgdata /var/lib/postgresql/node -v"
"11:35:59 30 INFO Reloaded the new configuration from ""/home/postgres/.config/pg_autoctl/var/lib/postgresql/node/pg_autoctl.cfg"""
"11:35:59 30 INFO pg_autoctl service is running, current state is ""wait_primary"""
"11:35:59 30 WARN Failed to update the keeper's state from the local PostgreSQL instance."
"11:35:59 30 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes."
"11:35:59 30 INFO Checking for HBA rules for node 6 ""db002-1"" (db002-1:5432)"
"11:35:59 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl replication ""pgautofailover_replicator"" db002-1 trust"
"11:35:59 30 WARN Skipping HBA edits (per --skip-pg-hba) for rule: hostssl ""postgres"" ""pgautofailover_replicator"" db002-1 trust"
"11:35:59 38 INFO /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/node -p 5432 -h *"
"11:35:59 30 WARN PostgreSQL was not running, restarted with pid 38"
"11:35:59 29 INFO Postgres is now serving PGDATA ""/var/lib/postgresql/node"" on port 5432 with pid 38"
"11:36:00 30 INFO Updated the keeper's state from the local PostgreSQL instance, which is running"
"11:36:00 30 INFO pg_autoctl managed to ensure current state ""wait_primary"": PostgreSQL is running"
"12:11:37 30 INFO Creating replication slot ""pgautofailover_standby_6"""
"13:10:39 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): wait_primary ➜ single"
"13:10:39 30 INFO Monitor assigned new state ""single"""
"13:10:39 30 INFO Dropping replication slot ""pgautofailover_standby_6"""
"13:10:39 30 INFO FSM transition from ""wait_primary"" to ""single"": Other node was forcibly removed, now single"
"13:10:39 30 INFO Disabling synchronous replication"
"13:10:39 30 INFO Reloading Postgres configuration and HBA rules"
"13:10:39 30 INFO Transition complete: current state is now ""single"""
"13:10:39 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): wait_primary ➜ single"
"13:10:39 30 INFO New state for this node (node 4, ""db002-2"") (db002-2:5432): single ➜ single"
"13:34:40 1 INFO pg_autoctl received signal SIGTERM, terminating"
"13:34:40 29 INFO Postgres controller service received signal SIGTERM, terminating"
"13:34:40 29 INFO Stopping pg_autoctl postgres service"
"13:34:40 29 INFO /usr/lib/postgresql/13/bin/pg_ctl --pgdata /var/lib/postgresql/node --wait stop --mode fast"
"13:34:40 1 INFO Waiting for subprocesses to terminate."
"13:34:40 1 ERROR Unknown subprocess died with pid 39"
"13:34:40 1 INFO Stop pg_autoctl"
On a first reading it looks like the secondary node is failing right after a manual failover has been started. Is that correct? If that's the case, I will play with testing this scenario and adding coverage in the test suite (unless we already have it).
Also please note that if we're seeing a failure during a failover operation, then we are in the territory of “multiple node failures” which where we have yet work to do. Our model at the moment is meant to fully tolerate the failure of any one node at a time, and knows how to handle multiple failures in some cases only.
Both postgres nodes were running at the time of failover. I believe there was some docker networking hiccup and pg_auto_failover already knew that the secondary is gone but did not take any action yet (maybe some timeout wasn't reached yet). However it did take the action as soon as we manually triggered the failover process. Sadly it does it at a bad time and breaks FSM.
Basically the primary should progress primary -> draining -> demote_timeout. But during that transition the monitor sets the state of the primary to wait_primary since the secondary wasn't reachable. When the monitor does this, it should not continue with the usual (requested) failover transition because that's why we end up in wait_primary -> demote_timeout.