All Slaves follow only from node1

Open utarn opened this issue 7 years ago • 1 comments

I run the docker-compose.yml from the lastest.yml. Look like it cannot connect to the 1st tier slave node.

# gosu postgres repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                          
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2
 3  | node3 | standby |   running | node1    | default  | user=replication_user password=replication_pass host=pgslave2 dbname=replication_db port=5432 connect_timeout=2
 4  | node4 | standby |   running | node1    | default  | user=replication_user password=replication_pass host=pgslave3 dbname=replication_db port=5432 connect_timeout=2
 5  | node5 | standby |   running | node1    | default  | user=replication_user password=replication_pass host=pgslave4 dbname=replication_db port=5432 connect_timeout=2

Here is what is shown in pgslave2

pgslave2_1_645dab692bcb | >>> Setting up STOP handlers...
pgslave2_1_645dab692bcb | >>> STARTING SSH (if required)...
pgslave2_1_645dab692bcb | cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
pgslave2_1_645dab692bcb | No pre-populated ssh keys!
pgslave2_1_645dab692bcb | >>> SSH is not enabled!
pgslave2_1_645dab692bcb | >>> STARTING POSTGRES...
pgslave2_1_645dab692bcb | >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
pgslave2_1_645dab692bcb | >>> TUNING UP POSTGRES...
pgslave2_1_645dab692bcb | >>> Cleaning data folder which might have some garbage...
pgslave2_1_645dab692bcb | >>> Auto-detected master name: ''
pgslave2_1_645dab692bcb | >>> Setting up repmgr...
pgslave2_1_645dab692bcb | >>> Setting up repmgr config file '/etc/repmgr.conf'...
pgslave2_1_645dab692bcb | >>> Setting up upstream node...
pgslave2_1_645dab692bcb | cat: /var/lib/postgresql/data/standby.lock: No such file or directory
pgslave2_1_645dab692bcb | >>> Previously Locked standby upstream node LOCKED_STANDBY=''
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 30 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 29 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 28 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 27 times more)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Waiting for upstream node id replication_db on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 30 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 29 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 28 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id 2 received from host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> REPLICATION_UPSTREAM_NODE_ID=2
pgslave2_1_645dab692bcb | >>> Sending in background postgres start...
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Starting standby node...
pgslave2_1_645dab692bcb | >>> Instance hasn't been set up yet.
pgslave2_1_645dab692bcb | >>> Clonning primary node...
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | NOTICE: destination directory "/var/lib/postgresql/data" provided
pgslave2_1_645dab692bcb | INFO: connecting to source node
pgslave2_1_645dab692bcb | DETAIL: connection string is: host=pgslave1 user=replication_user port=5432 dbname=replication_db
pgslave2_1_645dab692bcb | DETAIL: current installation size is 37 MB
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
pgslave2_1_645dab692bcb | >>> Replicated: 0
pgslave2_1_645dab692bcb | INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
pgslave2_1_645dab692bcb | NOTICE: starting backup (using pg_basebackup)...
pgslave2_1_645dab692bcb | INFO: executing:
pgslave2_1_645dab692bcb |   /usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/data -h pgslave1 -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_3
pgslave2_1_645dab692bcb | NOTICE: creating replication slot "repmgr_slot_3" on upstream node 1
pgslave2_1_645dab692bcb | NOTICE: replication slot "repmgr_slot_3" deleted on source node
pgslave2_1_645dab692bcb | NOTICE: standby clone (using pg_basebackup) complete
pgslave2_1_645dab692bcb | NOTICE: you can now start your PostgreSQL server
pgslave2_1_645dab692bcb | HINT: for example: pg_ctl -D /var/lib/postgresql/data start
pgslave2_1_645dab692bcb | HINT: after starting the server, you need to register this standby with "repmgr standby register"
pgslave2_1_645dab692bcb | INFO: executing notification command for event "standby_clone"
pgslave2_1_645dab692bcb | DETAIL: command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 standby_clone 1 "2018-12-26 23:57:47.269342+00" "cloned from host \"pgslave1\", port 5432; backup method: pg_basebackup; --force: Y"
pgslave2_1_645dab692bcb | [REPMGR EVENT] Node id: 3; Event type: standby_clone; Success [1|0]: 1; Time: 2018-12-26 23:57:47.269342+00;  Details: cloned from host "pgslave1", port 5432; backup method: pg_basebackup; --force: Y
pgslave2_1_645dab692bcb | >>> Configuring /var/lib/postgresql/data/postgresql.conf
pgslave2_1_645dab692bcb | >>>>>> Will add configs to the exists file
pgslave2_1_645dab692bcb | >>>>>> Adding config 'listen_addresses'=''*''
pgslave2_1_645dab692bcb | >>>>>> Adding config 'shared_preload_libraries'=''repmgr''
pgslave2_1_645dab692bcb | >>> Starting postgres...
pgslave2_1_645dab692bcb | >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
pgslave2_1_645dab692bcb | >>> Recovery is in progress:
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.303 UTC [195] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.304 UTC [195] LOG:  listening on IPv6 address "::", port 5432
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.308 UTC [195] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.340 UTC [204] LOG:  database system was interrupted while in recovery at log time 2018-12-26 23:57:06 UTC
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.340 UTC [204] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.360 UTC [204] LOG:  entering standby mode
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.362 UTC [204] LOG:  redo starts at 0/3000028
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.363 UTC [204] LOG:  consistent recovery state reached at 0/5000908
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.363 UTC [204] LOG:  invalid record length at 0/5000908: wanted 24, got 0
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.364 UTC [195] LOG:  database system is ready to accept read only connections
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.376 UTC [208] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
pgslave2_1_645dab692bcb | 2018-12-26 23:57:48.097 UTC [209] LOG:  incomplete startup packet
pgslave2_1_645dab692bcb | >>> Cloning is done
pgslave2_1_645dab692bcb | >>>>>> WAL id: 000000010000000000000005
pgslave2_1_645dab692bcb | >>>>>> WAL_RECEIVER_FLAG=1!
pgslave2_1_645dab692bcb | >>> Not in recovery state (anymore)
pgslave2_1_645dab692bcb | >>> Waiting for local postgres server start...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.public on pgslave2:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.public exists on host pgslave2:5432!
pgslave2_1_645dab692bcb | >>> Unregister the node if it was done before
pgslave2_1_645dab692bcb | ERROR:  cannot execute DELETE in a read-only transaction
pgslave2_1_645dab692bcb | >>> Registering node with role standby
pgslave2_1_645dab692bcb | INFO: connecting to local node "node3" (ID: 3)
pgslave2_1_645dab692bcb | INFO: connecting to primary database
pgslave2_1_645dab692bcb | WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
pgslave2_1_645dab692bcb | INFO: executing notification command for event "standby_register"
pgslave2_1_645dab692bcb | DETAIL: command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 standby_register 1 "2018-12-26 23:58:17.474092+00" "standby registration succeeded (-F/--force option was used)"
pgslave2_1_645dab692bcb | [REPMGR EVENT] Node id: 3; Event type: standby_register; Success [1|0]: 1; Time: 2018-12-26 23:58:17.474092+00;  Details: standby registration succeeded (-F/--force option was used)
pgslave2_1_645dab692bcb | INFO: standby registration complete
pgslave2_1_645dab692bcb | NOTICE: standby node "node3" (id: 3) successfully registered
pgslave2_1_645dab692bcb |  Locking standby (NEW_UPSTREAM_NODE_ID=1)...
pgslave2_1_645dab692bcb | >>> Starting repmgr daemon...
pgslave2_1_645dab692bcb | INFO: looking for configuration file in /etc
pgslave2_1_645dab692bcb | INFO: configuration file found at: "/etc/repmgr.conf"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [NOTICE] repmgrd (repmgr 4.0.6) starting up
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] connecting to database "user=replication_user password=replication_pass host=pgslave2 dbname=replication_db port=5432 connect_timeout=2"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [NOTICE] starting monitoring of node "node3" (ID: 3)
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] executing notification command for event "repmgrd_start"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [DETAIL] command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 repmgrd_start 1 "2018-12-26 23:58:17.577402+00" "monitoring connection to upstream node \"node1\" (node ID: 1)"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] monitoring connection to upstream node "node1" (node ID: 1)
pgslave2_1_645dab692bcb | >>> Setting up STOP handlers...
pgslave2_1_645dab692bcb | >>> STARTING SSH (if required)...
pgslave2_1_645dab692bcb | cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
pgslave2_1_645dab692bcb | No pre-populated ssh keys!
pgslave2_1_645dab692bcb | >>> SSH is not enabled!
pgslave2_1_645dab692bcb | >>> STARTING POSTGRES...
pgslave2_1_645dab692bcb | >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
pgslave2_1_645dab692bcb | >>> TUNING UP POSTGRES...
pgslave2_1_645dab692bcb | >>> Cleaning data folder which might have some garbage...
pgslave2_1_645dab692bcb | >>> Auto-detected master name: ''
pgslave2_1_645dab692bcb | >>> Setting up repmgr...
pgslave2_1_645dab692bcb | >>> Setting up repmgr config file '/etc/repmgr.conf'...
pgslave2_1_645dab692bcb | >>> Setting up upstream node...
pgslave2_1_645dab692bcb | cat: /var/lib/postgresql/data/standby.lock: No such file or directory
pgslave2_1_645dab692bcb | >>> Previously Locked standby upstream node LOCKED_STANDBY=''
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 30 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 29 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 28 times more)
pgslave2_1_645dab692bcb | psql: could not connect to server: Connection refused
pgslave2_1_645dab692bcb |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave2_1_645dab692bcb |       TCP/IP connections on port 5432?
pgslave2_1_645dab692bcb | >>>>>> Host pgslave1:5432 is not accessible (will try 27 times more)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Waiting for upstream node id replication_db on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 30 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 29 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id is still not accessible on host pgslave1:5432 (will try 28 times more)
pgslave2_1_645dab692bcb | >>>>>> Upstream node id 2 received from host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> REPLICATION_UPSTREAM_NODE_ID=2
pgslave2_1_645dab692bcb | >>> Sending in background postgres start...
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Starting standby node...
pgslave2_1_645dab692bcb | >>> Instance hasn't been set up yet.
pgslave2_1_645dab692bcb | >>> Clonning primary node...
pgslave2_1_645dab692bcb | >>> Waiting for upstream postgres server...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.repmgr on pgslave1:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave2_1_645dab692bcb | NOTICE: destination directory "/var/lib/postgresql/data" provided
pgslave2_1_645dab692bcb | INFO: connecting to source node
pgslave2_1_645dab692bcb | DETAIL: connection string is: host=pgslave1 user=replication_user port=5432 dbname=replication_db
pgslave2_1_645dab692bcb | DETAIL: current installation size is 37 MB
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.repmgr exists on host pgslave1:5432!
pgslave2_1_645dab692bcb | >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
pgslave2_1_645dab692bcb | >>> Replicated: 0
pgslave2_1_645dab692bcb | INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
pgslave2_1_645dab692bcb | NOTICE: starting backup (using pg_basebackup)...
pgslave2_1_645dab692bcb | INFO: executing:
pgslave2_1_645dab692bcb |   /usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/data -h pgslave1 -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_3
pgslave2_1_645dab692bcb | NOTICE: creating replication slot "repmgr_slot_3" on upstream node 1
pgslave2_1_645dab692bcb | NOTICE: replication slot "repmgr_slot_3" deleted on source node
pgslave2_1_645dab692bcb | NOTICE: standby clone (using pg_basebackup) complete
pgslave2_1_645dab692bcb | NOTICE: you can now start your PostgreSQL server
pgslave2_1_645dab692bcb | HINT: for example: pg_ctl -D /var/lib/postgresql/data start
pgslave2_1_645dab692bcb | HINT: after starting the server, you need to register this standby with "repmgr standby register"
pgslave2_1_645dab692bcb | INFO: executing notification command for event "standby_clone"
pgslave2_1_645dab692bcb | DETAIL: command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 standby_clone 1 "2018-12-26 23:57:47.269342+00" "cloned from host \"pgslave1\", port 5432; backup method: pg_basebackup; --force: Y"
pgslave2_1_645dab692bcb | [REPMGR EVENT] Node id: 3; Event type: standby_clone; Success [1|0]: 1; Time: 2018-12-26 23:57:47.269342+00;  Details: cloned from host "pgslave1", port 5432; backup method: pg_basebackup; --force: Y
pgslave2_1_645dab692bcb | >>> Configuring /var/lib/postgresql/data/postgresql.conf
pgslave2_1_645dab692bcb | >>>>>> Will add configs to the exists file
pgslave2_1_645dab692bcb | >>>>>> Adding config 'listen_addresses'=''*''
pgslave2_1_645dab692bcb | >>>>>> Adding config 'shared_preload_libraries'=''repmgr''
pgslave2_1_645dab692bcb | >>> Starting postgres...
pgslave2_1_645dab692bcb | >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
pgslave2_1_645dab692bcb | >>> Recovery is in progress:
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.303 UTC [195] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.304 UTC [195] LOG:  listening on IPv6 address "::", port 5432
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.308 UTC [195] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.340 UTC [204] LOG:  database system was interrupted while in recovery at log time 2018-12-26 23:57:06 UTC
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.340 UTC [204] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.360 UTC [204] LOG:  entering standby mode
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.362 UTC [204] LOG:  redo starts at 0/3000028
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.363 UTC [204] LOG:  consistent recovery state reached at 0/5000908
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.363 UTC [204] LOG:  invalid record length at 0/5000908: wanted 24, got 0
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.364 UTC [195] LOG:  database system is ready to accept read only connections
pgslave2_1_645dab692bcb | 2018-12-26 23:57:47.376 UTC [208] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
pgslave2_1_645dab692bcb | 2018-12-26 23:57:48.097 UTC [209] LOG:  incomplete startup packet
pgslave2_1_645dab692bcb | >>> Cloning is done
pgslave2_1_645dab692bcb | >>>>>> WAL id: 000000010000000000000005
pgslave2_1_645dab692bcb | >>>>>> WAL_RECEIVER_FLAG=1!
pgslave2_1_645dab692bcb | >>> Not in recovery state (anymore)
pgslave2_1_645dab692bcb | >>> Waiting for local postgres server start...
pgslave2_1_645dab692bcb | >>> Wait schema replication_db.public on pgslave2:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
pgslave2_1_645dab692bcb | >>>>>> Schema replication_db.public exists on host pgslave2:5432!
pgslave2_1_645dab692bcb | >>> Unregister the node if it was done before
pgslave2_1_645dab692bcb | ERROR:  cannot execute DELETE in a read-only transaction
pgslave2_1_645dab692bcb | >>> Registering node with role standby
pgslave2_1_645dab692bcb | INFO: connecting to local node "node3" (ID: 3)
pgslave2_1_645dab692bcb | INFO: connecting to primary database
pgslave2_1_645dab692bcb | WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
pgslave2_1_645dab692bcb | INFO: executing notification command for event "standby_register"
pgslave2_1_645dab692bcb | DETAIL: command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 standby_register 1 "2018-12-26 23:58:17.474092+00" "standby registration succeeded (-F/--force option was used)"
pgslave2_1_645dab692bcb | [REPMGR EVENT] Node id: 3; Event type: standby_register; Success [1|0]: 1; Time: 2018-12-26 23:58:17.474092+00;  Details: standby registration succeeded (-F/--force option was used)
pgslave2_1_645dab692bcb | INFO: standby registration complete
pgslave2_1_645dab692bcb | NOTICE: standby node "node3" (id: 3) successfully registered
pgslave2_1_645dab692bcb |  Locking standby (NEW_UPSTREAM_NODE_ID=1)...
pgslave2_1_645dab692bcb | >>> Starting repmgr daemon...
pgslave2_1_645dab692bcb | INFO: looking for configuration file in /etc
pgslave2_1_645dab692bcb | INFO: configuration file found at: "/etc/repmgr.conf"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [NOTICE] repmgrd (repmgr 4.0.6) starting up
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] connecting to database "user=replication_user password=replication_pass host=pgslave2 dbname=replication_db port=5432 connect_timeout=2"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [NOTICE] starting monitoring of node "node3" (ID: 3)
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] executing notification command for event "repmgrd_start"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [DETAIL] command is:
pgslave2_1_645dab692bcb |   /usr/local/bin/cluster/repmgr/events/router.sh 3 repmgrd_start 1 "2018-12-26 23:58:17.577402+00" "monitoring connection to upstream node \"node1\" (node ID: 1)"
pgslave2_1_645dab692bcb | [2018-12-26 23:58:17] [INFO] monitoring connection to upstream node "node1" (node ID: 1)

pgslave1_1_d43e93f50353 | >>> Setting up STOP handlers...
pgslave1_1_d43e93f50353 | >>> STARTING SSH (if required)...
pgslave1_1_d43e93f50353 | >>> TUNING UP SSH CLIENT...
pgslave1_1_d43e93f50353 | >>> STARTING SSH SERVER...
pgslave1_1_d43e93f50353 | >>> STARTING POSTGRES...
pgslave1_1_d43e93f50353 | >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
pgslave1_1_d43e93f50353 | >>> TUNING UP POSTGRES...
pgslave1_1_d43e93f50353 | >>> Cleaning data folder which might have some garbage...
pgslave1_1_d43e93f50353 | >>> Check all partner nodes for common upstream node...
pgslave1_1_d43e93f50353 | >>>>>> Checking NODE=pgmaster...
pgslave1_1_d43e93f50353 | psql: could not connect to server: Connection refused
pgslave1_1_d43e93f50353 |       Is the server running on host "pgmaster" (172.20.0.4) and accepting
pgslave1_1_d43e93f50353 |       TCP/IP connections on port 5432?
pgslave1_1_d43e93f50353 | >>>>>> Skipping: failed to get master from the node!
pgslave1_1_d43e93f50353 | >>>>>> Checking NODE=pgslave1...
pgslave1_1_d43e93f50353 | psql: could not connect to server: Connection refused
pgslave1_1_d43e93f50353 |       Is the server running on host "pgslave1" (172.20.0.5) and accepting
pgslave1_1_d43e93f50353 |       TCP/IP connections on port 5432?
pgslave1_1_d43e93f50353 | >>>>>> Skipping: failed to get master from the node!
pgslave1_1_d43e93f50353 | >>>>>> Checking NODE=pgslave3...
pgslave1_1_d43e93f50353 | psql: could not connect to server: Connection refused
pgslave1_1_d43e93f50353 |       Is the server running on host "pgslave3" (172.20.0.3) and accepting
pgslave1_1_d43e93f50353 |       TCP/IP connections on port 5432?
pgslave1_1_d43e93f50353 | >>>>>> Skipping: failed to get master from the node!
pgslave1_1_d43e93f50353 | >>> Auto-detected master name: ''
pgslave1_1_d43e93f50353 | >>> Setting up repmgr...
pgslave1_1_d43e93f50353 | >>> Setting up repmgr config file '/etc/repmgr.conf'...
pgslave1_1_d43e93f50353 | >>> Setting up upstream node...
pgslave1_1_d43e93f50353 | cat: /var/lib/postgresql/data/standby.lock: No such file or directory
pgslave1_1_d43e93f50353 | >>> Previously Locked standby upstream node LOCKED_STANDBY=''
pgslave1_1_d43e93f50353 | >>> Waiting for upstream postgres server...
pgslave1_1_d43e93f50353 | >>> Wait schema replication_db.repmgr on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave1_1_d43e93f50353 | psql: could not connect to server: Connection refused
pgslave1_1_d43e93f50353 |       Is the server running on host "pgmaster" (172.20.0.4) and accepting
pgslave1_1_d43e93f50353 |       TCP/IP connections on port 5432?
pgslave1_1_d43e93f50353 | >>>>>> Host pgmaster:5432 is not accessible (will try 30 times more)
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.repmgr is still not accessible on host pgmaster:5432 (will try 29 times more)
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.repmgr is still not accessible on host pgmaster:5432 (will try 28 times more)
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.repmgr exists on host pgmaster:5432!
pgslave1_1_d43e93f50353 | >>> Waiting for upstream node id replication_db on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave1_1_d43e93f50353 | >>>>>> Upstream node id 1 received from host pgmaster:5432!
pgslave1_1_d43e93f50353 | >>> REPLICATION_UPSTREAM_NODE_ID=1
pgslave1_1_d43e93f50353 | >>> Sending in background postgres start...
pgslave1_1_d43e93f50353 | >>> Waiting for upstream postgres server...
pgslave1_1_d43e93f50353 | >>> Wait schema replication_db.repmgr on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.repmgr exists on host pgmaster:5432!
pgslave1_1_d43e93f50353 | >>> Starting standby node...
pgslave1_1_d43e93f50353 | >>> Instance hasn't been set up yet.
pgslave1_1_d43e93f50353 | >>> Clonning primary node...
pgslave1_1_d43e93f50353 | >>> Waiting for upstream postgres server...
pgslave1_1_d43e93f50353 | >>> Wait schema replication_db.repmgr on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
pgslave1_1_d43e93f50353 | NOTICE: destination directory "/var/lib/postgresql/data" provided
pgslave1_1_d43e93f50353 | INFO: connecting to source node
pgslave1_1_d43e93f50353 | DETAIL: connection string is: host=pgmaster user=replication_user port=5432 dbname=replication_db
pgslave1_1_d43e93f50353 | DETAIL: current installation size is 37 MB
pgslave1_1_d43e93f50353 | INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
pgslave1_1_d43e93f50353 | NOTICE: starting backup (using pg_basebackup)...
pgslave1_1_d43e93f50353 | INFO: executing:
pgslave1_1_d43e93f50353 |   /usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/data -h pgmaster -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_2
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.repmgr exists on host pgmaster:5432!
pgslave1_1_d43e93f50353 | >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
pgslave1_1_d43e93f50353 | >>> Replicated: 0
pgslave1_1_d43e93f50353 | NOTICE: standby clone (using pg_basebackup) complete
pgslave1_1_d43e93f50353 | NOTICE: you can now start your PostgreSQL server
pgslave1_1_d43e93f50353 | HINT: for example: pg_ctl -D /var/lib/postgresql/data start
pgslave1_1_d43e93f50353 | HINT: after starting the server, you need to register this standby with "repmgr standby register"
pgslave1_1_d43e93f50353 | INFO: executing notification command for event "standby_clone"
pgslave1_1_d43e93f50353 | DETAIL: command is:
pgslave1_1_d43e93f50353 |   /usr/local/bin/cluster/repmgr/events/router.sh 2 standby_clone 1 "2018-12-26 23:57:06.86315+00" "cloned from host \"pgmaster\", port 5432; backup method: pg_basebackup; --force: Y"
pgslave1_1_d43e93f50353 | [REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-12-26 23:57:06.86315+00;  Details: cloned from host "pgmaster", port 5432; backup method: pg_basebackup; --force: Y
pgslave1_1_d43e93f50353 | >>> Configuring /var/lib/postgresql/data/postgresql.conf
pgslave1_1_d43e93f50353 | >>>>>> Will add configs to the exists file
pgslave1_1_d43e93f50353 | >>>>>> Adding config 'max_replication_slots'='10'
pgslave1_1_d43e93f50353 | >>>>>> Adding config 'shared_preload_libraries'=''repmgr''
pgslave1_1_d43e93f50353 | >>> Starting postgres...
pgslave1_1_d43e93f50353 | >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
pgslave1_1_d43e93f50353 | >>> Recovery is in progress:
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:06.984 UTC [187] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:06.985 UTC [187] LOG:  listening on IPv6 address "::", port 5432
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:06.986 UTC [187] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.050 UTC [197] LOG:  database system was interrupted; last known up at 2018-12-26 23:57:06 UTC
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.068 UTC [197] LOG:  entering standby mode
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.071 UTC [197] LOG:  redo starts at 0/3000028
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.071 UTC [197] LOG:  consistent recovery state reached at 0/4000248
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.072 UTC [187] LOG:  database system is ready to accept read only connections
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.085 UTC [201] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
pgslave1_1_d43e93f50353 | 2018-12-26 23:57:07.710 UTC [202] LOG:  incomplete startup packet
pgslave1_1_d43e93f50353 | >>> Cloning is done
pgslave1_1_d43e93f50353 | >>>>>> WAL id: 000000010000000000000005
pgslave1_1_d43e93f50353 | >>>>>> WAL_RECEIVER_FLAG=1!
pgslave1_1_d43e93f50353 | >>> Not in recovery state (anymore)
pgslave1_1_d43e93f50353 | >>> Waiting for local postgres server start...
pgslave1_1_d43e93f50353 | >>> Wait schema replication_db.public on pgslave1:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
pgslave1_1_d43e93f50353 | >>>>>> Schema replication_db.public exists on host pgslave1:5432!
pgslave1_1_d43e93f50353 | >>> Unregister the node if it was done before
pgslave1_1_d43e93f50353 | DELETE 0
pgslave1_1_d43e93f50353 | >>> Registering node with role standby
pgslave1_1_d43e93f50353 | INFO: connecting to local node "node2" (ID: 2)
pgslave1_1_d43e93f50353 | INFO: connecting to primary database
pgslave1_1_d43e93f50353 | WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
pgslave1_1_d43e93f50353 | INFO: executing notification command for event "standby_register"
pgslave1_1_d43e93f50353 | DETAIL: command is:
pgslave1_1_d43e93f50353 |   /usr/local/bin/cluster/repmgr/events/router.sh 2 standby_register 1 "2018-12-26 23:57:37.128943+00" "standby registration succeeded (-F/--force option was used)"
pgslave1_1_d43e93f50353 | [REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-12-26 23:57:37.128943+00;  Details: standby registration succeeded (-F/--force option was used)
pgslave1_1_d43e93f50353 | INFO: standby registration complete
pgslave1_1_d43e93f50353 | NOTICE: standby node "node2" (id: 2) successfully registered
pgslave1_1_d43e93f50353 |  Locking standby (NEW_UPSTREAM_NODE_ID=1)...
pgslave1_1_d43e93f50353 | >>> Starting repmgr daemon...
pgslave1_1_d43e93f50353 | INFO: looking for configuration file in /etc
pgslave1_1_d43e93f50353 | INFO: configuration file found at: "/etc/repmgr.conf"
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [NOTICE] repmgrd (repmgr 4.0.6) starting up
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [INFO] connecting to database "user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2"
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [NOTICE] starting monitoring of node "node2" (ID: 2)
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [INFO] executing notification command for event "repmgrd_start"
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [DETAIL] command is:
pgslave1_1_d43e93f50353 |   /usr/local/bin/cluster/repmgr/events/router.sh 2 repmgrd_start 1 "2018-12-26 23:57:37.238597+00" "monitoring connection to upstream node \"node1\" (node ID: 1)"
pgslave1_1_d43e93f50353 | [2018-12-26 23:57:37] [INFO] monitoring connection to upstream node "node1" (node ID: 1)
pgslave1_1_d43e93f50353 | 2018-12-26 23:58:17.426 UTC [314] ERROR:  cannot execute DELETE in a read-only transaction
pgslave1_1_d43e93f50353 | 2018-12-26 23:58:17.426 UTC [314] STATEMENT:  DELETE FROM repmgr.nodes WHERE conninfo LIKE '%host=pgslave2%'

Dec 27 '18 00:12 utarn

I'm having the same issue...

Feb 16 '21 21:02 dwjorgeb