Support for machine instance scaling
Service discovery Service discovery is the automatic discovery and registration of services (in this case machine instances) that are available to use. With the current setup, during the machine initialization process it registers itself with and it's details (the url and external port it can be reached by other containers, the algorithms it can run) with the main lab server. The lab server is responsible for load balancing ml requests among the registered machines. Machine registration/load balancing on the lab server needs to be updated and tested (it has not been run with more then one machine), and a method to unregister machines needs to be added.
Static scaling Static scaling implementation would comprise a custom docker-compose configuration with multiple hardcoded machine instances with static port assignments.
Dynamic scaling
Docker orchestrators (docker-compose, kubertenes, etc) support dynamic scaling. For example, initializing with docker-compose up --scale machine=3 will create 3 machine instances, but the external ports assigned to each instance are dynamically allocated (a range can be specified) and docker does not have a simple introspection method that allows an instance to see what external ports it has been assigned. This means that the current self-registration method for machines will need to be modified.
Common practice seems to be to implement dynamic service discovery and registration with something like Etcd and Haproxy. This kind of service watches the events docker emits as containers are created and destroyed, and can register/unregister machine instances accordingly.
Implementation Static scaling is easier to implement but does not take advantage of the dynamic scaling capabilities of docker orchestrators and would be more difficult to test and configure for different environments. Dynamic scaling would allow use of features like docker load balancing, container restartability and dependency. It may require a slightly more complicated architecture (the reg/discovery service), but once implemented custom configuration would be easier and better tested.
Will start with implementing a static scaling solution; this will allow initial implementation/testing of the machine registration and load balancing done by the lab server, and allow implementation of methods to unregister a machine. Once working, will work on implementing (perhaps optional) dynamic service discovery/registration architecture.
Possible dynamic scaling tech stack (see this for discussion):
- Registrator - registers/deregisters docker containers (services) as they come online. Once instance of a Registrator docker container is run on every host.
- confd - lightweight configuration tool for services, used by etcd and Consul
-
etcd or consul - service registries. One or several docker containers where services are registered to.
- etcd is a lightweight and mature solution where services are registered/deregisterd through HTTP. Service info is stored as simple k/v pairs.
- Consul is a heavier system that has features such as a framework for service discovery, health checks, a web ui, can handle nodes spread across multiple data centers etc. Registration can be done through DNS or HTTP.
Registrator has support for Consul and etcd, so a possible path is to start with etcd and move to Consul if more advanced features are needed.
While some Docker orchestration systems (Docker swarm, Kubernetes, ECS, etc) offer pieces of service discovery functionality, the goal is to implement a complete service discovery stack that is not dependent on the specific Docker orchestration system, or to swap out the registration/deregistration piece for different orch. systems but keep the same service registrar (etcd/consul).
More docker networking/service discovery resources:
- Docker Networking and Service Discovery (O'Rilley booklet, 2016) - https://learning.oreilly.com/library/view/docker-networking-and/9781492042488/titlepage01.html
- amazon ecs service discovery - https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/
- https://stackoverflow.com/questions/18285212/how-to-scale-docker-containers-in-production