aaw icon indicating copy to clipboard operation
aaw copied to clipboard

Repeatable way to deploy istio egress gateways onto AAW

Open Collinbrown95 opened this issue 3 years ago • 0 comments

Epic: #1097

TODO

  • [x] Figure out why default configuration for egress gateway does not work on AAW
  • [x] Advice from CNS on a repeatable way to deploy istio egress gateways on AAW
  • [ ] Implement the egress gateway deployment

Description

Given that the Istio egress gateway for https://github.com/StatCan/daaas/issues/1097 is the first use case for an egress gateway on the AAW, we should brainstorm a repeatable way to deploy egress gateways that aligns with how existing Istio resources are deployed.

I tried to deploy an egress gateway into the cloud-main-system namespace (created in aaw-dev-cc-00 in https://github.com/StatCan/daaas/issues/1133) using a modified version of the Kubernetes yaml approach outlined in the Istio documentation. I was able to get this example working in a local k3d cluster, but not on aaw-dev-cc-00 - I include more detail on these attempts later in this issue.

As per the first point, I thought it would be good to touch base before going too far down any debugging rabbit hole as I'm probably missing something fundamental about how Istio is configured on the AAW.

What I Tried

As a proof of concept, I tried deploying an istio egress gateway into the cloud-main-system namespace using a slightly modified version of the minimum example Kubernetes yaml approach from the Istio documentation, which I include below.

# egress-gateway.yaml (based off of https://istio.io/latest/docs/setup/additional-setup/gateway/#deploying-a-gateway)
apiVersion: v1
kind: Service
metadata:
  name: cloud-main-systemgateway
  namespace: cloud-main-system
spec:
  type: LoadBalancer
  selector:
    istio: egressgateway
  ports:
  - port: 80
    name: http
  - port: 443
    name: https
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloud-main-systemgateway
  namespace: cloud-main-system
spec:
  selector:
    matchLabels:
      istio: egressgateway
  template:
    metadata:
      annotations:
        # Select the gateway injection template (rather than the default sidecar template)
        inject.istio.io/templates: gateway
      labels:
        # Set a unique label for the gateway. This is required to ensure Gateways can select this workload
        istio: egressgateway
        # Enable gateway injection. If connecting to a revisioned control plane, replace with "istio.io/rev: revision-name"
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: istio-proxy
        image: auto # The image will automatically update each time the pod starts.
        resources:
          limits:
            memory: "1Gi"
            cpu: "800m"
          requests:
            memory: "600Mi"
            cpu: "400m"
---
# Set up roles to allow reading credentials for TLS
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cloud-main-systemgateway-sds
  namespace: cloud-main-system
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cloud-main-systemgateway-sds
  namespace: cloud-main-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cloud-main-systemgateway-sds
subjects:
- kind: ServiceAccount
  name: default

k3d

Steps to reproduce:

k3d cluster create --config=k3d/config.yaml
kubectl --context=k3d-istio-egress-gateway create ns cloud-main-system
kubectl --context=k3d-istio-egress-gateway label ns/cloud-main-system istio-injection=enabled --overwrite
istioctl --context=k3d-istio-egress-gateway install --set profile=minimal -y
kubectl --context=k3d-istio-egress-gateway apply -f k8s/egressgateway/egress-gateway.yaml -y

where k3d/config.yaml is as follows:

# k3d/config.yaml
# k3d cluster create --config=k3d/config.yaml
apiVersion: k3d.io/v1alpha3
kind: Simple
name: istio-egress-gateway
# When connecting to the host network, k3d only allows a single server node.
servers: 1
network: host

result: everything appears to be working correctly on the local k3d cluster. Importantly, the pods in the Deployment are up and running without error (see screenshot below).

image

aaw-dev-cc-00

Steps to reproduce: since the cloud-main-system namespace is already created on aaw-dev-cc-00 (see https://github.com/StatCan/daaas/issues/1133), I just applied k8s/egressgateway/egress-gateway.yaml directly to the cloud-main-system namespace.

kubectl apply -f k8s/egressgateway/egress-gateway.yaml 

When I do this, all of the resources posted make it past admission control, the pods behind the Deployment are successfully scheduled to a node, the docker.io/istio/proxyv2:1.7.8 image is pulled successfully, and the istio-validation container is started successfully.

However, the istio-validation container instantly fails with status Init:ContainerStatusUnknown. There are no further events associated with the pods in the deployment, and there are no log messages associated with the failure. The only log message I can get before the pod is deleted is Stream closed EOF for cloud-main-system/cloud-main-systemgateway-8657489956-ccnwd (istio-validation). The only other information I can find is that the pod finishes with the terminated state and exit code 126 with reason Error.

There may be another way to gain visibility into why the istio-validation container is failing, but I'm not sure how to proceed as I can't figure out how to get more information about what causes the failure.

Next Steps

@sylus and @zachomedia , I would like to get your input about how we should be handling deployments of egress gateways on the AAW. Based on what I've tried so far, I'm guessing that there are some missing prerequisites from my example or that I'm misconfiguring something.

Also, if there is a different way I should be deploying the egress gateway (e.g. using the Istio operator instead of applying Kubernetes manifests directly), I'm happy to explore such options.

Please let me know if I can provide any additional information or if anything above is unclear. Thanks in advance!

Done

  • [x] Can't use auto under image; need to specify a particular envoy proxy image
  • [x] Follow egress gateway example in Istio 1.7 docs and attempt to deploy egress gateway.
  • [x] ~~Change cloud-main-system namespace to have label "istio-injection" = "disabled" (b/c the egress gateway is itself an envoy proxy and shouldn't be injected with a sidecar proxy).~~ The working standalone example actually has istio-injection = "enabled" in the cloud-main-system namespace and everything appears to be working correctly. I'll leave this check list item in case we need to refer back to it but for now the problem seems to be solved.

TODO

There are a number of prerequisite issues that should be tackled in the following order:

  1. A number of recommendations were made by CNS which I detail in the following issue https://github.com/StatCan/daaas/issues/1208 - first step is to verify that everything continues to behave correctly when we apply one change at a time to the already-working standalone example. Once all components are working with the recommended changes, we can be confident to apply the changes in the various configuration and controller code in the AAW codebase.
  2. #1207 - refactor the network policies that are created by the network.go controller in the aaw-kubeflow-profiles-controller.
  3. #1209 - refactor the istio.go controller in the aaw-kubeflow-profiles-controller.
  4. #1210 - Update the Istio Operator controller to watch the cloud-main-system namespace for IstioOperator resources.
  • [ ] Once the above items are completed, the last step is to figure out where to deploy the IstioOperator for the egress-gateway. Currently, this code lives in my standalone example repo, but it needs to be deployed from somewhere in the AAW codebase.
  • [ ] Once decision is made on the step above, need to add the IstioOperator manifest to that location and ensure it is deployed correctly by ArgoCD.

Collinbrown95 avatar Jun 08 '22 12:06 Collinbrown95