weave-gitops show pods in dashboard for easier debugging of reconcilation errors

Problem When a reconcilation fails, e. g. due to an imagepullbackoff due to a wrong image tag or missing pull secret, the dashboard is quite useless, as it only shows the deployment and not the pods (old and new).

Solution The dashboard should also show all the pods and not just the deployment. For the pods it should show the whole yaml to see all the events.

Additional context For a test deployment I have deliberately broken the image tag reference to get a ImagePullBackOff. So reconciliation fails and the old pod stays active. Unfortunately Weave GitOps doesn’t tell me that story. It only tells me that reconciliation is in progress and something fails the health check, but in order to see the problem I need to connect to the cluster and use kubectl:

NAME READY STATUS RESTARTS AGE pod/release-name-nodebrady-5978488bb8-m62gd 1/1 Running 0 11m pod/release-name-nodebrady-c9897f486-6rmgn 0/1 ImagePullBackOff 0 5m4s

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff. Clicking on the failing pod I would then also see the reason for the imagepullbackoff:

Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "peter:pan": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/peter:pan": failed to resolve reference "docker.io/library/peter:pan": failed to do request: Head https://registry-1.docker.io/v2/library/peter/manifests/pan: x509: certificate signed by unknown authority

Oct 05 '23 13:10 schdief

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

Oct 05 '23 16:10 bigkevmcd

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

I agree that for many pods this is a bad idea, maybe you can add a button to see all pods of an deployment and thr default view only shows the number and maybe failed ones. But if I really want I would still like to see all, even if there are 100 :)

Thanks for looking into it!

Oct 05 '23 16:10 schdief

Hi @schdief

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.

Ah, you do not see the pods in the graph view?

Is this from a kustomization or a helmrelease?

Oct 10 '23 07:10 foot

Is this from a kustomization or a helmrelease?

Kustomization (using Weave GitOps 0.33)

this is the yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: 2023-10-06T11:11:28Z
  generation: 2
  labels:
    app.kubernetes.io/name: nodebrady
    helm.sh/chart: nodebrady-v0.3.0
    kustomize.toolkit.fluxcd.io/name: nodebrady-master
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:app.kubernetes.io/name: {}
            f:helm.sh/chart: {}
            f:kustomize.toolkit.fluxcd.io/name: {}
            f:kustomize.toolkit.fluxcd.io/namespace: {}
        f:spec:
          f:replicas: {}
          f:selector: {}
          f:strategy: {}
          f:template:
            f:metadata:
              f:creationTimestamp: {}
              f:labels:
                f:app.kubernetes.io/name: {}
            f:spec:
              f:containers:
                k:{"name":"nodebrady"}:
                  .: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:name: {}
                  f:ports:
                    k:{"containerPort":3000,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:protocol: {}
                  f:resources: {}
              f:imagePullSecrets:
                k:{"name":"css-qhcr-sdm-dockerconfig"}: {}
                k:{"name":"css-thcr-sdm-dockerconfig"}: {}
      manager: kustomize-controller
      operation: Apply
      time: 2023-10-10T14:56:31Z
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:availableReplicas: {}
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:observedGeneration: {}
          f:readyReplicas: {}
          f:replicas: {}
          f:updatedReplicas: {}
      manager: kube-controller-manager
      operation: Update
      subresource: status
      time: 2023-10-06T12:28:05Z
  name: nodebrady
  namespace: phippyandfriends-master
  resourceVersion: "217768850"
  uid: 66a1966e-3c03-41f8-84b4-2bfc2a8549cd
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: nodebrady
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: nodebrady
    spec:
      containers:
        - image: xxx/nodebrady:20231006.1426.8-master
          imagePullPolicy: Always
          name: nodebrady
          ports:
            - containerPort: 3000
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
        - name: css-qhcr-sdm-dockerconfig
        - name: css-thcr-sdm-dockerconfig
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
    - lastTransitionTime: 2023-10-06T11:11:32Z
      lastUpdateTime: 2023-10-06T11:11:32Z
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    - lastTransitionTime: 2023-10-06T11:11:28Z
      lastUpdateTime: 2023-10-06T12:28:05Z
      message: ReplicaSet "nodebrady-6cc7f5bbbc" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Oct 10 '23 14:10 schdief

Gotcha! So there is a bug here where we don't shoq the pods in the graph if the namespace differs from the kustomization.

To the other point of showing the pods in the table, we have all the data available, just have to figure out a design..

Oct 17 '23 11:10 foot