helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Surface helm template errors

Open stefanprodan opened this issue 4 years ago • 2 comments

I noticed that when Helm encounters a templating error, the controller will not surface this in the Ready status nor in logs.

Given these values:

ingress:
  enabled: true
  hosts:
    - podinfo.production

The Helm CLI errors out with:

$ helm install -f badvalues.yaml mypodinfo ./charts/podinfo/
Error: template: podinfo/templates/ingress.yaml:28:15: executing "podinfo/templates/ingress.yaml" at <.host>: can't evaluate field host in type interface {}

Using the same values in a HelmRelease the controller logs:

{"level":"error","ts":"2021-06-23T14:25:23.845Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"podinfo","namespace":"podinfo","error":"Helm uninstall failed: uninstall: Release not loaded: podinfo: release: not found"}
{"level":"error","ts":"2021-06-23T14:28:08.066Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"podinfo","namespace":"podinfo","error":"previous release attempt remediation failed"}

The ready status shows no template error:

podinfo  	helmrelease/podinfo	False	Helm uninstall failed: uninstall: Release not loaded: podinfo: release: not found

HelmRelease definition:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: podinfo
  namespace: podinfo
spec:
  releaseName: podinfo
  chart:
    spec:
      chart: podinfo
      sourceRef:
        kind: HelmRepository
        name: podinfo
        namespace: flux-system
  interval: 5m
  install:
    remediation:
      retries: 3
  values:
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx
      hosts:
      - podinfo.production

stefanprodan avatar Jun 24 '21 16:06 stefanprodan

might be related, but not sure:

after bootstraping new cluster from existing repo, chart was first stuck in install retries exhausted and then become Helm uninstall failed: uninstall: Release not loaded: ingress-nginx-public: release: not found

root cause was I think absent HR dependency for ingress-nginx-public HR (prometheus for creating service monitor and rules), but while flux suspend|resume usually works on HR stuck in install retries exhausted I was not able to fix Helm uninstall failed up until I've restarted all flux pods: kustomize, helm, source controllers. although around the same time I've updated HR with dependency so might've been that instead.

What I've noticed is that there were zero traces of helm release: no resources or helm secrets. I've also tried to flux delete hr and then reconciling upstream kustomization to force whole HR recreation - but it was always stuck in Helm uninstall failed status. To me it looked like flux somehow recorded somewhere that this HR is installed and needs to be uninstalled and then failing for obvious reasons.

Will be doing same bootstrapping new cluster soon, will check if this happens again.

flux: v0.24.1
helm-controller: v0.14.1
image-automation-controller: v0.18.0
image-reflector-controller: v0.14.0
kustomize-controller: v0.18.2
notification-controller: v0.19.0
source-controller: v0.19.2

tbondarchuk avatar Dec 16 '21 10:12 tbondarchuk

Reproduced with incorrect values for helm chart, so helm template was failing locally but HR status was Helm uninstall failed: uninstall: Release not loaded

I suspect its because of my HR config:

spec:
  install:
    remediation:
      retries: 3

Makes sense - controller tries to install helm chart, fails, tries to uninstall as a remediation step, fails to uninstall since it's a first install, ends up with last failure message.

tbondarchuk avatar Dec 22 '21 16:12 tbondarchuk