PostDock icon indicating copy to clipboard operation
PostDock copied to clipboard

To do proper switchover in cluster

Open paunin opened this issue 8 years ago • 3 comments

paunin avatar May 07 '17 15:05 paunin

hello @paunin, manual switchover is currently not working, maybe because SSH is not working at all:

First I tried to run the command in a standby container:

lars@laptop:~/Downloads/PostDock$ docker -D exec postdock_pgsecondary2_1 su -c "repmgr standby switchover" - postgres
NOTICE: executing switchover on node "node3" (ID: 3)
WARNING: unable to connect to remote host "pgmaster" via SSH
ERROR: unable to connect via SSH to host "pgmaster", user ""
DEBU[0000] [hijack] End of stdout    

Then i tried to debug it interactively in the container of my master container:

lars@laptop:~/Downloads/PostDock$ docker exec -ti postdock_pgmaster_1 bash
root@82cdc571e72b:/# ssh pgsecondary1
ssh: connect to host pgsecondary1 port 22: Connection refused

I generated new ssh-keys and enabled SSH in my docker-compose file:

version: '2'

services:
    pgmaster:
        restart: always
        image: registry.iznet/lk/hapg:ssh

        environment:
            SSH_ENABLE: 1
            NODE_ID: 1 # Integer number of node (not required if can be extracted from NODE_NAME var, e.g. node-45 => 1045)
            NODE_NAME: node1 # Node name
            CLUSTER_NODE_NETWORK_NAME: pgmaster # (default: hostname of the node)
            PARTNER_NODES: "pgmaster,pgsecondary1,pgsecondary2"
            REPLICATION_PRIMARY_HOST: pgmaster
            POSTGRES_PASSWORD: monkey_pass
            POSTGRES_USER: monkey_user
            POSTGRES_DB: monkey_db
            CLEAN_OVER_REWIND: 1
            MASTER_RESPONSE_TIMEOUT: 5
            RECONNECT_ATTEMPTS: 1
            CONFIGS: "listen_addresses:'*'"
                                  # in format variable1:value1[,variable2:value2[,...]] if CONFIGS_DELIMITER_SYMBOL=, and CONFIGS_ASSIGNMENT_SYMBOL=:
                                  # used for pgpool.conf file
            #defaults:
            CLUSTER_NAME: pg_cluster # default is pg_cluster
            REPLICATION_DB: replication_db # default is replication_db
            REPLICATION_USER: replication_user # default is replication_user
            REPLICATION_PASSWORD: replication_pass # default is replication_pass
        volumes:
                #- ./ssh:/var/lib/postgres/.ssh
            - ./ssh/:/home/postgres/.ssh/keys

    pgsecondary1:
        restart: always
        image: registry.iznet/lk/hapg:ssh
        environment:
            SSH_ENABLE: 1
            PARTNER_NODES: "pgmaster,pgsecondary1,pgsecondary2"
            REPLICATION_PRIMARY_HOST: pgmaster
            NODE_ID: 2
            NODE_NAME: node2
            CLUSTER_NODE_NETWORK_NAME: pgsecondary1
            CLEAN_OVER_REWIND: 1
            RECONNECT_ATTEMPTS: 1
            MASTER_RESPONSE_TIMEOUT: 5
        volumes:
            - ./ssh/:/home/postgres/.ssh/keys

    pgsecondary2:
        restart: always
        image: registry.iznet/lk/hapg:ssh
        environment:
            SSH_ENABLE: 1
            PARTNER_NODES: "pgmaster,pgsecondary1,pgsecondary2"
            REPLICATION_PRIMARY_HOST: pgmaster
            NODE_ID: 3
            NODE_NAME: node3
            CLUSTER_NODE_NETWORK_NAME: pgsecondary2
            CLEAN_OVER_REWIND: 1
            RECONNECT_ATTEMPTS: 1
            MASTER_RESPONSE_TIMEOUT: 5
        volumes:
            - ./ssh/:/home/postgres/.ssh/keys

Brice187 avatar Apr 22 '19 06:04 Brice187

I recovered SSH(see the test) but think it has more issues apart of SSH

paunin avatar Apr 27 '19 05:04 paunin

Hi Paunin,

We are trying to switchover a standby, and as per the repmgr instructions the repmgrd daemon is not supposed to run during this switchover.

However, when I try to shutdown the repmgrd the pod is crashing and in a loop.

Can you please let us know how to accomplish this?

Thanks, Bhuvan.

kbhuvanamohan avatar May 10 '19 11:05 kbhuvanamohan