pg_auto_failover icon indicating copy to clipboard operation
pg_auto_failover copied to clipboard

Tutorial Part II.

Open DimCitus opened this issue 5 years ago • 2 comments

Add an interesting story where we migrate from one region to another thanks to a short list of pg_auto_failover commands. This time this happens in Europe.

DimCitus avatar Sep 29 '20 09:09 DimCitus

The first tutorial is already heavy in az ... commands, we spend a non-trivial amount of time setting up the stage for the demo. This second tutorial is even worse, with quite a complex multi-sites settings now. I think we still want to have the complex setup, but it would be better that we don't spend quite so much time preparing it in our tutorial.

Also, the resource group names prefix and the regions should ideally be parameters. Maybe some other things could be parameters too, but that's a demo, so to get started just those two elements would be good.

Given this, I'm now thinking of adding a pg_autoctl do azure series of command. Ideally, as little commands as possible. It could be pg_autoctl do azure tutorial <part> with part being 1 or 2 at the moment, but also I think we want to be able to implement multi-step tutorials where we prep a region, then another one with an async standby, then add more nodes to the second region, then switchover production to a new region, etc ; so I'm thinking we should have a top-level command per documentation top-level entry point (tutorial 1, tutorial 2), and also lower-level commands to compose together:

$ pg_autoctl do azure create tutorial 1 --prefix ha-demo --location eastus
$ pg_autoctl do azure create tutorial 2 --prefix ha-demo --location francecentral --location westeurope --location northeurope

# creating a region creates the VPN connection with pre-existing regions

$ pg_autoctl do azure create region --prefix ha-demo --name paris --location francecentral --monitor --nodes 2
$ pg_autoctl do azure create region --prefix ha-demo --name amsterdam --location westeurope --nodes 1 --no-wait
$ pg_autoctl do azure create region --prefix ha-demo --name dublin --empty --no-wait

# if we have an azure-${prefix}.ini file we can discover regions and wait until 
# gateways are available and then create the VPNs to cross-connect everything
# without asking user input for the region list etc.

$ pg_autoctl do azure create vpn --prefix ha-demo --all --wait

$ pg_autoctl do azure create node --prefix ha-demo --region dublin --replication-quorum false --candidate-priority 0

$ pg_autoctl do azure ls
$ pg_autoctl do azure ssh <node name>
$ pg_autoctl do azure show state --watch

$ pg_autoctl do azure set region quorum
$ pg_autoctl do azure set region priority

We might want to create an azure-${prefix}.ini file with some of the information such as the list of regions with locations and names, the global prefix in use, where is the monitor, etc, so that we can compose commands more easily. Node names would be automatically derived from region name appended with a letter.

Given all that, then we should have a tutorial version that just uses our top-level commands (create tutorial, ls, show state), and a documentation page with all the commands that are run by our tool. Having a --dry-run option or --script option that generates the azure commands would then be quite useful to produce that detailed step-by-step documentation.

DimCitus avatar Oct 01 '20 11:10 DimCitus

Agreed that all those az commands obscure the content. However bundling them in the pg_autoctl tool sounds like a confusing mixture of concerns. First, as a potentially vendor-neutral tool, it's odd for it to have that connection with Azure. Second, the Azure interface may change which would then require you to go through a whole release process to fix the tool. Third, hiding the Azure steps behind pg_autoctl subcommands prevents the user from understanding at all what's happening.

Is it possible to either shorten the commands by bundling repetitive ones into loops, or failing that, to make them downloadable scripts linked from the tutorial?

I agree with you that some users will prefer to see everything, and others will want a shortcut and don't bother. Also, it's important that we can easily reproduce those steps for our QA needs, and I'm thinking about extending the provisioning that we currently have to also work from any git branch, rather than only from packages.

To try and cater to all needs, I am currently working on https://github.com/citusdata/pg_auto_failover/pull/442 that includes a --script option to most commands, allowing to produce the script that will end up in the tutorial, or even for users to review etc.

Here's a couple examples:

$ PG_AUTOCTL_DEBUG=1 pg_autoctl do azure create region --prefix ha-demo-dim --region paris --location francecentral --monitor --nodes 2 --script
18:58:25 98894 INFO  Fetching resources that might already exist from a previous run
18:58:25 98894 INFO   /usr/local/bin/az resource list --output json --query [?resourceGroup=='ha-demo-dim-paris'].{ name: name, resourceType: type }
18:58:26 98894 INFO  Creating group "ha-demo-dim-paris" in location "francecentral"
18:58:26 98894 INFO  Skipping creation of vnet "ha-demo-dim-paris-net" which already exist
18:58:26 98894 INFO  Skipping creation of nsg "ha-demo-dim-paris-nsg" which already exist
18:58:26 98894 INFO  Creating network nsg rules "ha-demo-dim-paris-ssh-and-pg" for our IP address "92.151.108.38" for ports 22 and 5432
18:58:26 98894 INFO  Creating network subnet "ha-demo-dim-paris-subnet" using address prefix "10.11.11.0/24"
18:58:26 98894 INFO   /usr/local/bin/az vm list-ip-addresses --resource-group ha-demo-dim-paris --query [] [] . { name: virtualMachine.name, "public address": virtualMachine.network.publicIpAddresses[0].ipAddress, "private address": virtualMachine.network.privateIpAddresses[0] } -o json
18:58:28 98894 INFO  Creating Virtual Machines for a monitor and 2 Postgres nodes, in parallel
18:58:28 98894 INFO  Skipping creation of VM "ha-demo-dim-paris-monitor", which already exists with public IP address 51.11.243.15
18:58:28 98894 INFO  Skipping creation of VM "ha-demo-dim-paris-a", which already exists with public IP address 51.11.246.76
18:58:28 98894 INFO  Skipping creation of VM "ha-demo-dim-paris-b", which already exists with public IP address 51.11.246.111
18:58:28 98894 INFO  Provisioning 3 Virtual Machines in parallel
18:58:28 98894 INFO  Provisioning Virtual Machine "ha-demo-dim-paris-monitor"
18:58:28 98894 INFO  Provisioning Virtual Machine "ha-demo-dim-paris-a"
18:58:28 98894 INFO  Provisioning Virtual Machine "ha-demo-dim-paris-b"
# azure commands for pg_auto_failover demo
 /usr/local/bin/az group create --name ha-demo-dim-paris --location francecentral
 /usr/local/bin/az network nsg rule create --resource-group ha-demo-dim-paris --nsg-name ha-demo-dim-paris-nsg --name ha-demo-dim-paris-ssh-and-pg --access allow --protocol Tcp --direction Inbound --priority 100 --source-address-prefixes 92.151.108.38 --source-port-range "*" --destination-address-prefix "*" --destination-port-ranges 22 5432
 /usr/local/bin/az network vnet subnet create --resource-group ha-demo-dim-paris --vnet-name ha-demo-dim-paris-net --name ha-demo-dim-paris-subnet --address-prefixes 10.11.11.0/24 --network-security-group ha-demo-dim-paris-nsg
 /usr/local/bin/az vm run-command invoke --resource-group ha-demo-dim-paris --name ha-demo-dim-paris-monitor --command-id RunShellScript --scripts "curl https://install.citusdata.com/community/deb.sh | sudo bash" "sudo apt-get install -q -y postgresql-common" "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" "sudo apt-get install -q -y postgresql-11-auto-failover-1.4" "sudo usermod -a -G postgres ha-admin" &
 /usr/local/bin/az vm run-command invoke --resource-group ha-demo-dim-paris --name ha-demo-dim-paris-a --command-id RunShellScript --scripts "curl https://install.citusdata.com/community/deb.sh | sudo bash" "sudo apt-get install -q -y postgresql-common" "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" "sudo apt-get install -q -y postgresql-11-auto-failover-1.4" "sudo usermod -a -G postgres ha-admin" &
 /usr/local/bin/az vm run-command invoke --resource-group ha-demo-dim-paris --name ha-demo-dim-paris-b --command-id RunShellScript --scripts "curl https://install.citusdata.com/community/deb.sh | sudo bash" "sudo apt-get install -q -y postgresql-common" "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" "sudo apt-get install -q -y postgresql-11-auto-failover-1.4" "sudo usermod -a -G postgres ha-admin" &
wait

And then I could also create the nodes:

$ PG_AUTOCTL_DEBUG=1 pg_autoctl do azure create nodes --prefix ha-demo-dim --region paris --location francecentral --monitor --nodes 2 --script
18:59:35 98908 INFO   /usr/local/bin/az vm list-ip-addresses --resource-group ha-demo-dim-paris --query [] [] . { name: virtualMachine.name, "public address": virtualMachine.network.publicIpAddresses[0].ipAddress, "private address": virtualMachine.network.privateIpAddresses[0] } -o json
# azure commands for pg_auto_failover demo
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.243.15 -- pg_autoctl create monitor --auth trust --ssl-self-signed --pgdata /home/ha-admin/monitor --pgctl /usr/lib/postgresql/11/bin/pg_ctl
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.243.15 -- pg_autoctl -q show systemd --pgdata /home/ha-admin/monitor > pgautofailover.service; sudo mv pgautofailover.service /etc/systemd/system; sudo systemctl daemon-reload; sudo systemctl enable pgautofailover; sudo systemctl start pgautofailover
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.246.76 -- pg_autoctl create postgres --pgctl /usr/lib/post
gresql/11/bin/pg_ctl --pgdata /home/ha-admin/pgdata --auth trust --ssl-self-signed --username ha-admin --dbname appdb  --hostname 10.11.11.5 --name paris-a --monitor 'postgres://[email protected]/pg_auto_failover?sslmode=require'
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.246.76 -- pg_autoctl -q show systemd --pgdata /home/ha-admin/pgdata > pgautofailover.service; sudo mv pgautofailover.service /etc/systemd/system; sudo systemctl daemon-reload; sudo systemctl enable pgautofail
over; sudo systemctl start pgautofailover
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.246.111 -- pg_autoctl create postgres --pgctl /usr/lib/postgresql/11/bin/pg_ctl --pgdata /home/ha-admin/pgdata --auth trust --ssl-self-signed --username ha-admin --dbname appdb  --hostname 10.11.11.6 --name p
aris-b --monitor 'postgres://[email protected]/pg_auto_failover?sslmode=require'
 /usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.246.111 -- pg_autoctl -q show systemd --pgdata /home/ha-admin/pgdata > pgautofailover.service; sudo mv pgautofailover.service /etc/systemd/system; sudo systemctl daemon-reload; sudo systemctl enable pgautofai
lover; sudo systemctl start pgautofailover
 /usr/bin/ssh -t -o StrictHostKeyChecking=no -o UserKnownHostsFile /dev/null -l ha-admin 51.11.243.15 -- watch -n 0.2 pg_autoctl show state --pgdata /home/ha-admin/monitor

In my session here I also used the commands to actually create all the Azure resources and then do the pg_autoctl create monitor|postgres and then the systemd integration, all with those two commands. The same as just above, just not using the --script option.

About the tutorial, what I have in mind would be offering multiple choices to the reader:

  • each main piece of the tutorial scripts would have a name, and its own documentation page
  • instead of just embedding the script, we could provide the option to run a simple command or click a link and see all the azure CLI commands ready for copy/paste
  • we would keep the direct user commands for the parts we actually want to show in the tutorial.

So I hope it makes sense. Worst case, I am going to finish my current automation scripts anyway for QA, with the provisioning from git, so that we can easily deploy the current branch on Azure VMs and play around there, possibly targeting multiple regions.

DimCitus avatar Oct 02 '20 17:10 DimCitus