open-match Proposal: Replace existing gRPC client side loadbalancer

Background

We identified an issue in v0.7 that Kubernetes’ default traffic load balancer doesn’t work well with gRPC services, which prevents Open Match from scaling horizontally.

Current Solution:

The workaround that Open Match uses required changing the k8s service type from ClusterIP to None (headless service), enabling gRPC client side load balancing, and changing the gRPC address resolver from passthrough to dns instead. A headless service could get rid of the virtual IP server and expose the Pod IP address directly via DNS lookup, the client side round robin could then LB the gRPC requests to different pods based on the result from querying the DNS resolver.

Issues:

The existing solution #864 works but introduced some other issues:

Does not support out-of-cluster LB by simply changing the service type to LoadBalancer. Layer 7 (application layer) LoadBalancers can resolve this issue but are platform specific[1].
Does not work well with k8s autoscalers. Exposing service IP addresses directly is problematic if some of the pods are unavailable[i] or some pods are newly added[ii].
1. Connecting to a pod without going through a virtual IP server doesn’t guarantee the pod status at the moment is healthy
2. gRPC leverages HTTP2 underneath so the connections are long lived. By default, gRPC does not talk to the DNS resolver if a connection is healthy. To make it work with the autoscaler, we have to manually add a timeout to the gRPC connections and force them to be reset once in a while. This approach works, but IP “discovery” takes time, and in the worst case scenario, gRPC may take up to the timeout period to discover the new pods.
DNS resolver doesn’t work well with local integration tests. Local integration test starts failing since it couldn't find a DNS resolver when running on localhost.

[1] GCE starts to support L7 LoadBalacers - but requires some extra setups to integrate it with GKE deployment

Alternatives:

Switching from the existing solution to a solution that could support out-of-cluster LB

Service Mesh: LinkerD

Linkerd is so far the most performant Open Source Service Mesh with L7 LB support, and ease of monitoring - it comes with a built-in metrics dashboard. However, Linkerd doesn’t provide an out-of-cluster solution yet.

Service proxy: Envoy

Envoy is an open source edge and service proxy. Istio is using it as its proxy solution. Comparing to the other solutions, it has the following features:

Provides an LB story for out-of-the-cluster connections.
Can detect outlier hosts and ejecting them from the healthy load balancing set for a temporary period.
Can manage credentials. Supports TLS termination at its listener layer.
Multiple gRPC filters in one config file. Ease to manage and maintain.

K8S Ingress: Nginx, HA Proxy

k8s ingress works but feels off when used to load balance gRPC services. To use k8s ingress, one has to install an ingress controller first, then define the re-routing logics using an ingress scheme. This approach works well for HTTP1 based route based apps but doesn’t consider gRPC microservices applications as its first-class citizen. e.g. The following ingress spec will forward requests sent to /testpath to endpoint test:80. However, since gRPC doesn’t support subpath, we have to define multiple ingress schemes for each of the gRPC service to bypass this restriction, which added a lot of complexity when managing our LB related configs.

spec:
  rules:
  - http:
      paths:
      - path: /testpath
        backend:
          serviceName: test
          servicePort: 80

Proposal:

Envoy sounds like a better choice compared to the alternatives as it is easy to manage and provides an out-of-the-cluster LB story.

Considerations:

Here is a list of questions that Envoy brings in that may be worth discussions.

Should we use Envoy for both in-cluster and out-of-cluster LB solution? There are two options here
- Use Envoy as the API Gateway for Open Match. All traffic now goes to Envoy as Envoy provides a better load balancing algorithm than our existing client-side round robin one. This is a plus to the user experience as there is no need to unveil the internal port numbers and endpoints to the users.
- Only enable Envoy as a LoadBalancer if the director/gamefrontend is out-of-the-cluster.
How fast is Envoy? We might need to benchmark the performance of Envoy against Open Match under different scenarios.
Right now Open Match provides TLS support with some restrictions - See TLS guide for the details. If we choose Envoy as the default LB story for Open Match, can we use it to manage the credentials and secrets and get rid of the security patterns in Open Match instead?
We may need a new metric dashboard for Envoy to monitor its performance
Our existing tracing story may be affected. Apparently Envoy supports tracing with Jaeger but I’m not sure what is the scope of the required changes.

Feb 22 '20 01:02 yfei1

As per offline discussion:

The out of cluster load balancing solutions can be addressed via documentation / recommendation (we need not change anything in Open Match / core examples for that)
The in-cluster solution currently is still not perfect - however fixing that is not high priority.

Ideally, as we address (2), we can see how well the solution extends to (1). However, I think addressing this is lower on priority list compared to other v0.10 tasks and can flow over to v1.0 if we are not able to get to this in v0.10

Feb 26 '20 21:02 sawagh

Moving this to v1.0.

Feb 26 '20 21:02 yfei1

Yufan, please provide your prototype for this working, and any other findings you haven't documented yet. Thanks!

Apr 03 '20 21:04 Laremere

The Envoy prototype is available at https://github.com/yfei1/open-match/commit/0039fa7b00cc99f134294950cd74d7a4b46d4fb6. All of the findings are documented in this doc.

Apr 04 '20 04:04 yfei1

Yufan, please provide your prototype for this working, and any other findings you haven't documented yet. Thanks!

The prototype is available at https://github.com/yfei1/open-match/commit/0039fa7b00cc99f134294950cd74d7a4b46d4fb6. All of my findings are available in this doc.

Apr 04 '20 04:04 yfei1