Handle scaling / replacing of nodes in conjunction with provisioning ansible / weave network addresses / config
When adding nodes to the cluster we will run into problems with the way we assign the weave bridge address, and scaling. We need to think about the approach for this. Some options off the top of my head (but needs way more thought)
- form a tigher integration with terraform to derive the values of the bridges / CIDR (possibly).
- use a fact from the machine being provisioned which is dynamic / unique per host
- use the backend service discovery (consul) to populate the values, where consul would hold the master state of the cluster at any point
We also need to handle if nodes are removed / replaced in the cluster, without screwing up the rest of the cluster configuration and breaking the weave network.
This is discussed over here - https://github.com/Capgemini/Apollo/pull/351#commitcomment-11938353 https://github.com/Capgemini/Apollo/pull/351 https://github.com/Capgemini/Apollo/pull/342
The above bullet list covers it well - there are two choices for maintaining allocations.
- use a fact from the machine that is unique to compute the weave bridge and CIDR blocks
- use shared, coordinated state to track the allocations for purposes of consistency and GCing bridge and block values
For one data point, coreos fabric uses the 2nd approach. If Apollo took this route, we'd want to use consul (with consistent reads) to allocate the bridge and blocks when machines are provisioned and likely a GC process for reclaiming unused blocks.
Another data point, Udacity chose to use the 1st approach (since we only ever plan to target AWS), computing the bridge and CIDR blocks based on the VPC IP assigned to each host by AWS. This requires some careful CIDR math and VPC subnet construction.
You can use the same strategy described below with different starting VPC subnet offsets and masks to create multiple (or larger) clusters in a VPC. The example below scales to 1024 hosts and 1024*254 containers (260096)
e.g.
Given an AWS Subnet:
CIDR: 10.0.4.0/22
nr of Addresses: 1024
Range: 10.0.4.0 - 10.0.7.255
And an assigned IP: 10.0.X.Y
We can compute the weave CIDR block for each host by shifting up the last two segments of the AWS-assigned IP.
Weave CIDR block: 10.X.Y.0/24
Weave Bridge: 10.X.0.Y/14 (note that Y is never 0)
Advantages:
1. not coordinated
2. elastic - we can add servers at will
3. cidr blocks allocated to a server that goes down are automatically GCd - they will be re-used when AWS re-assigns that server's IP to a new server
Here's code for it: https://github.com/udacity/Apollo/commit/e459832a30a097ba58cffb0b1c4710cc459b19be
Could potentially be addressed by https://github.com/Capgemini/Apollo/issues/395