vic Investigation on support for Docker SwarmKit by vic engine

User Statement:

As an admin user, I want to be able to use Docker SwarmKit to manage/orchestrate my VCH'es and DCH'es if multiple VCH'es and DCH'es are deployed across geographically located clusters. This may be helpful for achieving a variety of goals, e.g., VCH load-balancing, VCH redudancy (HA), VCH affinity/anti-affinity rules ...

Details: We need to investigate the following:

when features does Docker SwarmKit support and what docker remote APIs do these features need? This will help us identify the additional docker cmd supported needed for the vic engine to support Docker SwarmKit.
Assuming that the vic-engine supports all the necessary docker APIs, how can we deploy Docker SwarmKit using multiple VCH'es as swarm hosts?
How can we integrate Docker SwarmKit with DCH'es (aka DinV)?

Acceptance Criteria:

A list of the docker APIs that are required for the vic-engine to support Docker SwarmKit (grouped by swarmkit features)
If possible, identify a list of SwarmKit feature that are already supported by the vic-engine given the current docker APIs that it supports.
If possible, provide a guide on how to deploy SwarmKit with multiple VCH'es as docker hosts.
An epic issue which tracks the implementation and testing plan of supporting Docker SwarmKit by the vic-engine

Sep 12 '17 20:09 chengwang86

@corrieb Can you comment on this? I believe you've done more research and have some ideas.

Sep 13 '17 20:09 sflxn

Investigation into feasibility is ongoing. SwarmKit has both a containerd and engine plugin. The engine plugin should work with VIC, but I've hit my first roadblock with that. It pulls and creates images, but they don't start.

Swarmkit consists of a daemon and a client. It probably makes sense to embed the daemon in the VIC appliance. A user would then have to run the client manually, unless we had some integration in Admiral.

The work involved here may involve creating a custom VIC plugin for SwarmKit because we will want it to work with the container networks and have the ability to take things like our custom Volume options. We should contribute that back to SwarmKit in OSS if we need to do that.

Sep 14 '17 09:09 corrieb

I can verify that with the fix for https://github.com/vmware/vic/issues/6372, I can get SwarmKit to drive a single VCH as a node, deploying containers as a "service", creating replicas and then deleting them when the service stops. This is a good start to the validation process. There's no reason this simple test shouldn't work with multiple VCHs.

Sep 15 '17 22:09 corrieb

I'd like to define a series of scenarios that would constitute an MVP for running SwarmKit with VIC engine. These are intended to satisfy particular customer requirements and to ensure that some of the core capabilities of VIC are capable of being satisfied. The initial approach laid out by this issue focuses on API support, which is important, but I want to ensure that we're covering important business value from which API support can be derived.

Note that these scenarios do not prescribe how or where swarmd or swarmctl are to be run

Scenarios:

1) Single VCH "service" capability

Premise:

Ensures that a certain number of instances of a container can be maintained against a single VCH and that if instances are unhealthy or die, that SwarmKit is able to recreate and kill them. The use of a single VCH may seem to negate much of the advantage of the SwarmKit scheduler, but let's not forget that vSphere is already able to schedule containers to "nodes" and so unless VCH redundancy is required, the combination certainly has value.

This needs to be able to work at a minimum with vSphere container networks and should be able to pass VolumeStore options for volume creation.

This should work just fine for both stateless and stateful workloads. However, there are significant caveats for stateful workloads. If a single stateful service is made up of multiple containers, by implication there should only be a single persistent source of truth. That means that the containers all need to see a shared filesystem (NFS volume). It's possible to conceive of a model where ephemeral cVMs come up and down against a pool of volumes if a volume represents an isolated thread of work that is advanced whenever a cVM attaches to it. In that scenario, you would need to be able to schedule containers to available volumes, which I'm not sure SwarmKit is capable of. TBD

Pre-requisites:

vSphere HA should ideally be disabled for the cVMs, otherwise this will compete with SwarmKit. That means we need a means of specifying HA for the endpoint VM, but not for the cVMs. Need to investigate if this is possible.
cVM health-checks need to be implemented
Swarm needs to be able to support the certificate-based authentication with the VCH

Scenarios:

Create a service in SwarmKit, pointed to a single VCH and set the number of instances to N, where N > 1. Manually kill containers, or make them unhealthy and make sure that SwarmKit maintains the right number
Create a service with various types of VolumeStore and Volume configuration
Create a service attached to an container network and show how HA proxy can handle the changing IP addresses as services come and go
Ensure that SwarmKit is compatible with the container network firewall

2) Single VCH container rolling update

Premise:

This is an adaptation of (1) and has very similar pre-requisites. The basic premise here is that you should be able to upgrade an instance of a container by performing a rolling update against a single VCH. This is another benefit of using SwarmKit to schedule containers in the single VCH case, although this should work just as well against multiple VCHs.

The same caveats apply to stateful workloads as in (1).

Scenarios:

Deploy an instance of a container with N instances, where N>1. Specify an upgraded image version and show the containers can be updated with no loss of downtime. This should work against the NFS Volume Store and with container networks. Show how HA proxy can make a service seamlessly available.
Perform a rollback

3) Multiple VCH redundancy

Premise:

It should be possible to use SwarmKit to schedule one or more containers to multiple VCHs. These VCHs may be in different vSphere clusters, different vCenters, different racks or geographies. The purpose of the redundancy is that if a single VCH becomes unhealthy for some reason, SwarmKit is able to shift the workloads to a VCH that is healthy.

This is obviously a more natural fit for stateless containers as non-NFS VolumeStores are not designed to be sharable between VCHs. Even if an external LUN is visible to multiple vCenters, detaching a VMDK from one cVM in one VCH and re-attaching it to another in a different VCH is complex and arguably something of an anti-pattern for containers. That said, we may want to consider a convention if it becomes something customers see value in.

This should however be able to work with our NFS volume store support and we should test that.

Pre-requisites:

Heath checks for the VCH endpoint need to be implemented
Need to consider how to warn users of the problems associated with data gravity and/or appropriately configure VCHs to have NFS volume stores with the same name.
Need to have container networks of the same name preconfigured on all VCHs. Need to consider if there is a way around this. Maybe have "container network" be a network driver and allow SwarmKit to "create" networks which are in fact just aliases to the same underlying port group.

Scenarios:

Deploy N containers to a Swarm cluster of more than one VCH. Look for even distribution of containers. Make a VCH unhealthy in some way. Ensure that SwarmKit detects the health issue and re-deploys container instances to other nodes. This should work with NFS volume support and container networks (with the caveats above).
Bring a VCH from unhealthy to healthy state and look for SwarmKit to rebalance nodes to it
Ensure that managed downtime can be handled by explicitly draining a node

4) Label-based scheduling to multiple VCHs

Premise:

The premise is simple. The ability to associate labels with VCHs and have SwarmKit apply scheduling rules for containers to those nodes based on the labels. The business value here is that there may be data gravity or latency issues where it's important that particular containers are only ever scheduled to a single VCH, while other containers can be more flexibly scheduled.

Pre-requisites:

Ability to associate labels with a VCH

Scenarios:

Set up multiple VCHs, apply labels to them and ensure that correct scheduling occurs.
Add or remove VCHs in a controlled way and make sure that scheduling is done appropriately

Sep 18 '17 14:09 corrieb

Nice project, I will create a lab for tests similary you. I'll be back with feeds, sya

May 17 '19 04:05 eduardoscheidet