Investigation on support for Docker SwarmKit by vic engine
User Statement:
As an admin user, I want to be able to use Docker SwarmKit to manage/orchestrate my VCH'es and DCH'es if multiple VCH'es and DCH'es are deployed across geographically located clusters. This may be helpful for achieving a variety of goals, e.g., VCH load-balancing, VCH redudancy (HA), VCH affinity/anti-affinity rules ...
Details: We need to investigate the following:
- when features does
Docker SwarmKitsupport and whatdocker remote APIsdo these features need? This will help us identify the additional docker cmd supported needed for the vic engine to supportDocker SwarmKit. - Assuming that the vic-engine supports all the necessary docker APIs, how can we deploy
Docker SwarmKitusing multiple VCH'es as swarm hosts? - How can we integrate
Docker SwarmKitwith DCH'es (aka DinV)?
Acceptance Criteria:
- A list of the docker APIs that are required for the vic-engine to support
Docker SwarmKit(grouped by swarmkit features) - If possible, identify a list of SwarmKit feature that are already supported by the vic-engine given the current docker APIs that it supports.
- If possible, provide a guide on how to deploy SwarmKit with multiple VCH'es as docker hosts.
- An epic issue which tracks the implementation and testing plan of supporting
Docker SwarmKitby the vic-engine
@corrieb Can you comment on this? I believe you've done more research and have some ideas.
Investigation into feasibility is ongoing. SwarmKit has both a containerd and engine plugin. The engine plugin should work with VIC, but I've hit my first roadblock with that. It pulls and creates images, but they don't start.
Swarmkit consists of a daemon and a client. It probably makes sense to embed the daemon in the VIC appliance. A user would then have to run the client manually, unless we had some integration in Admiral.
The work involved here may involve creating a custom VIC plugin for SwarmKit because we will want it to work with the container networks and have the ability to take things like our custom Volume options. We should contribute that back to SwarmKit in OSS if we need to do that.
I can verify that with the fix for https://github.com/vmware/vic/issues/6372, I can get SwarmKit to drive a single VCH as a node, deploying containers as a "service", creating replicas and then deleting them when the service stops. This is a good start to the validation process. There's no reason this simple test shouldn't work with multiple VCHs.
I'd like to define a series of scenarios that would constitute an MVP for running SwarmKit with VIC engine. These are intended to satisfy particular customer requirements and to ensure that some of the core capabilities of VIC are capable of being satisfied. The initial approach laid out by this issue focuses on API support, which is important, but I want to ensure that we're covering important business value from which API support can be derived.
Note that these scenarios do not prescribe how or where swarmd or swarmctl are to be run
Scenarios:
1) Single VCH "service" capability
Premise:
Ensures that a certain number of instances of a container can be maintained against a single VCH and that if instances are unhealthy or die, that SwarmKit is able to recreate and kill them. The use of a single VCH may seem to negate much of the advantage of the SwarmKit scheduler, but let's not forget that vSphere is already able to schedule containers to "nodes" and so unless VCH redundancy is required, the combination certainly has value.
This needs to be able to work at a minimum with vSphere container networks and should be able to pass VolumeStore options for volume creation.
This should work just fine for both stateless and stateful workloads. However, there are significant caveats for stateful workloads. If a single stateful service is made up of multiple containers, by implication there should only be a single persistent source of truth. That means that the containers all need to see a shared filesystem (NFS volume). It's possible to conceive of a model where ephemeral cVMs come up and down against a pool of volumes if a volume represents an isolated thread of work that is advanced whenever a cVM attaches to it. In that scenario, you would need to be able to schedule containers to available volumes, which I'm not sure SwarmKit is capable of. TBD
Pre-requisites:
- vSphere HA should ideally be disabled for the cVMs, otherwise this will compete with SwarmKit. That means we need a means of specifying HA for the endpoint VM, but not for the cVMs. Need to investigate if this is possible.
- cVM health-checks need to be implemented
- Swarm needs to be able to support the certificate-based authentication with the VCH
Scenarios:
- Create a service in SwarmKit, pointed to a single VCH and set the number of instances to N, where N > 1. Manually kill containers, or make them unhealthy and make sure that SwarmKit maintains the right number
- Create a service with various types of VolumeStore and Volume configuration
- Create a service attached to an container network and show how HA proxy can handle the changing IP addresses as services come and go
- Ensure that SwarmKit is compatible with the container network firewall
2) Single VCH container rolling update
Premise:
This is an adaptation of (1) and has very similar pre-requisites. The basic premise here is that you should be able to upgrade an instance of a container by performing a rolling update against a single VCH. This is another benefit of using SwarmKit to schedule containers in the single VCH case, although this should work just as well against multiple VCHs.
The same caveats apply to stateful workloads as in (1).
Scenarios:
- Deploy an instance of a container with N instances, where N>1. Specify an upgraded image version and show the containers can be updated with no loss of downtime. This should work against the NFS Volume Store and with container networks. Show how HA proxy can make a service seamlessly available.
- Perform a rollback
3) Multiple VCH redundancy
Premise:
It should be possible to use SwarmKit to schedule one or more containers to multiple VCHs. These VCHs may be in different vSphere clusters, different vCenters, different racks or geographies. The purpose of the redundancy is that if a single VCH becomes unhealthy for some reason, SwarmKit is able to shift the workloads to a VCH that is healthy.
This is obviously a more natural fit for stateless containers as non-NFS VolumeStores are not designed to be sharable between VCHs. Even if an external LUN is visible to multiple vCenters, detaching a VMDK from one cVM in one VCH and re-attaching it to another in a different VCH is complex and arguably something of an anti-pattern for containers. That said, we may want to consider a convention if it becomes something customers see value in.
This should however be able to work with our NFS volume store support and we should test that.
Pre-requisites:
- Heath checks for the VCH endpoint need to be implemented
- Need to consider how to warn users of the problems associated with data gravity and/or appropriately configure VCHs to have NFS volume stores with the same name.
- Need to have container networks of the same name preconfigured on all VCHs. Need to consider if there is a way around this. Maybe have "container network" be a network driver and allow SwarmKit to "create" networks which are in fact just aliases to the same underlying port group.
Scenarios:
- Deploy N containers to a Swarm cluster of more than one VCH. Look for even distribution of containers. Make a VCH unhealthy in some way. Ensure that SwarmKit detects the health issue and re-deploys container instances to other nodes. This should work with NFS volume support and container networks (with the caveats above).
- Bring a VCH from unhealthy to healthy state and look for SwarmKit to rebalance nodes to it
- Ensure that managed downtime can be handled by explicitly draining a node
4) Label-based scheduling to multiple VCHs
Premise:
The premise is simple. The ability to associate labels with VCHs and have SwarmKit apply scheduling rules for containers to those nodes based on the labels. The business value here is that there may be data gravity or latency issues where it's important that particular containers are only ever scheduled to a single VCH, while other containers can be more flexibly scheduled.
Pre-requisites:
- Ability to associate labels with a VCH
Scenarios:
- Set up multiple VCHs, apply labels to them and ensure that correct scheduling occurs.
- Add or remove VCHs in a controlled way and make sure that scheduling is done appropriately
Nice project, I will create a lab for tests similary you. I'll be back with feeds, sya