show pods in dashboard for easier debugging of reconcilation errors
Problem When a reconcilation fails, e. g. due to an imagepullbackoff due to a wrong image tag or missing pull secret, the dashboard is quite useless, as it only shows the deployment and not the pods (old and new).
Solution The dashboard should also show all the pods and not just the deployment. For the pods it should show the whole yaml to see all the events.
Additional context For a test deployment I have deliberately broken the image tag reference to get a ImagePullBackOff. So reconciliation fails and the old pod stays active. Unfortunately Weave GitOps doesn’t tell me that story. It only tells me that reconciliation is in progress and something fails the health check, but in order to see the problem I need to connect to the cluster and use kubectl:
NAME READY STATUS RESTARTS AGE pod/release-name-nodebrady-5978488bb8-m62gd 1/1 Running 0 11m pod/release-name-nodebrady-c9897f486-6rmgn 0/1 ImagePullBackOff 0 5m4s
The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff. Clicking on the failing pod I would then also see the reason for the imagepullbackoff:
Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "peter:pan": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/peter:pan": failed to resolve reference "docker.io/library/peter:pan": failed to do request: Head https://registry-1.docker.io/v2/library/peter/manifests/pan: x509: certificate signed by unknown authority
@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.
But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.
@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.
But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.
I agree that for many pods this is a bad idea, maybe you can add a button to see all pods of an deployment and thr default view only shows the number and maybe failed ones. But if I really want I would still like to see all, even if there are 100 :)
Thanks for looking into it!
Hi @schdief
The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.
Ah, you do not see the pods in the graph view?
Is this from a kustomization or a helmrelease?
Is this from a kustomization or a helmrelease?
Kustomization (using Weave GitOps 0.33)
this is the yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: 2023-10-06T11:11:28Z
generation: 2
labels:
app.kubernetes.io/name: nodebrady
helm.sh/chart: nodebrady-v0.3.0
kustomize.toolkit.fluxcd.io/name: nodebrady-master
kustomize.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
f:app.kubernetes.io/name: {}
f:helm.sh/chart: {}
f:kustomize.toolkit.fluxcd.io/name: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:spec:
f:replicas: {}
f:selector: {}
f:strategy: {}
f:template:
f:metadata:
f:creationTimestamp: {}
f:labels:
f:app.kubernetes.io/name: {}
f:spec:
f:containers:
k:{"name":"nodebrady"}:
.: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
k:{"containerPort":3000,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:protocol: {}
f:resources: {}
f:imagePullSecrets:
k:{"name":"css-qhcr-sdm-dockerconfig"}: {}
k:{"name":"css-thcr-sdm-dockerconfig"}: {}
manager: kustomize-controller
operation: Apply
time: 2023-10-10T14:56:31Z
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:deployment.kubernetes.io/revision: {}
f:status:
f:availableReplicas: {}
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:observedGeneration: {}
f:readyReplicas: {}
f:replicas: {}
f:updatedReplicas: {}
manager: kube-controller-manager
operation: Update
subresource: status
time: 2023-10-06T12:28:05Z
name: nodebrady
namespace: phippyandfriends-master
resourceVersion: "217768850"
uid: 66a1966e-3c03-41f8-84b4-2bfc2a8549cd
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/name: nodebrady
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/name: nodebrady
spec:
containers:
- image: xxx/nodebrady:20231006.1426.8-master
imagePullPolicy: Always
name: nodebrady
ports:
- containerPort: 3000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: css-qhcr-sdm-dockerconfig
- name: css-thcr-sdm-dockerconfig
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2023-10-06T11:11:32Z
lastUpdateTime: 2023-10-06T11:11:32Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2023-10-06T11:11:28Z
lastUpdateTime: 2023-10-06T12:28:05Z
message: ReplicaSet "nodebrady-6cc7f5bbbc" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Gotcha! So there is a bug here where we don't shoq the pods in the graph if the namespace differs from the kustomization.
To the other point of showing the pods in the table, we have all the data available, just have to figure out a design..