helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Ready status when single pod can't start

Open phillebaba opened this issue 5 years ago • 4 comments

I have found some weird behavior when testing the status behavior of HelmRelease.

The following setup should deploy the Helm charts podinfo and redis, both of which should fail as the tag foo does not exist for any of the images.

apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: HelmRepository
metadata:
  name: podinfo
  namespace: gitops-system
spec:
  url: https://stefanprodan.github.io/podinfo
  interval: 10m
---
apiVersion: helm.toolkit.fluxcd.io/v2alpha1
kind: HelmRelease
metadata:
  name: frontend
  namespace: gitops-system
spec:
  targetNamespace: webapp
  interval: 5m
  chart:
    spec:
      chart: podinfo
      version: '>=4.0.0 <5.0.0'
      sourceRef:
        kind: HelmRepository
        name: podinfo
      interval: 1m
  values:
    image:
      tag: foo
---
apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: HelmRepository
metadata:
  name: stable
  namespace: gitops-system
spec:
  url: https://kubernetes-charts.storage.googleapis.com/
  interval: 10m
---
apiVersion: helm.toolkit.fluxcd.io/v2alpha1
kind: HelmRelease
metadata:
  name: redis
  namespace: gitops-system
spec:
  targetNamespace: webapp
  interval: 5m
  chart:
    spec:
      chart: redis
      sourceRef:
        kind: HelmRepository
        name: stable
      interval: 1m
  values:
    image:
      tag: foo

Both result in pods in a ImagePullBackOff state.

NAME                                       READY   STATUS             RESTARTS   AGE
webapp-frontend-podinfo-6694fbcbc4-rvjcn   0/1     ImagePullBackOff   0          6m32s
webapp-redis-master-0                      0/1     ImagePullBackOff   0          4m59s
webapp-redis-slave-0                       0/1     ImagePullBackOff   0          4m59s

Yet the podinfo HelmRelease ends up in a ready state which redis does not.

NAME       READY   STATUS                                                     AGE
frontend   True    release reconciliation succeeded                           7m15s
redis      False   Helm install failed: timed out waiting for the condition   5m45s

I would expect both HelmReleases to not be in a ready state.

phillebaba avatar Sep 13 '20 20:09 phillebaba

This is likely due to Helm's own behaviour for the --wait flag, and a Deployment only having a single replica. See: https://github.com/helm/helm/issues/5814#issuecomment-567130226

hiddeco avatar Sep 14 '20 10:09 hiddeco

Proposed fix here:

https://github.com/helm/helm/pull/8671

Note: you may be able to work around it by setting maxUnavailable differently (or unsetting it).

seaneagan avatar Sep 14 '20 13:09 seaneagan

Is it worth fixing before we get a new release of Helm with this fix? Health checks that now use kstatus are dependent on the status being properly set.

https://github.com/fluxcd/kustomize-controller/pull/101

phillebaba avatar Sep 14 '20 18:09 phillebaba

@phillebaba there is no fix for this that we can do in fluxcd, this needs to be fixed upstream. We should document the Helm bug in our docs.

stefanprodan avatar Sep 15 '20 11:09 stefanprodan