Surface helm template errors
I noticed that when Helm encounters a templating error, the controller will not surface this in the Ready status nor in logs.
Given these values:
ingress:
enabled: true
hosts:
- podinfo.production
The Helm CLI errors out with:
$ helm install -f badvalues.yaml mypodinfo ./charts/podinfo/
Error: template: podinfo/templates/ingress.yaml:28:15: executing "podinfo/templates/ingress.yaml" at <.host>: can't evaluate field host in type interface {}
Using the same values in a HelmRelease the controller logs:
{"level":"error","ts":"2021-06-23T14:25:23.845Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"podinfo","namespace":"podinfo","error":"Helm uninstall failed: uninstall: Release not loaded: podinfo: release: not found"}
{"level":"error","ts":"2021-06-23T14:28:08.066Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"podinfo","namespace":"podinfo","error":"previous release attempt remediation failed"}
The ready status shows no template error:
podinfo helmrelease/podinfo False Helm uninstall failed: uninstall: Release not loaded: podinfo: release: not found
HelmRelease definition:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: podinfo
namespace: podinfo
spec:
releaseName: podinfo
chart:
spec:
chart: podinfo
sourceRef:
kind: HelmRepository
name: podinfo
namespace: flux-system
interval: 5m
install:
remediation:
retries: 3
values:
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
hosts:
- podinfo.production
might be related, but not sure:
after bootstraping new cluster from existing repo, chart was first stuck in install retries exhausted and then become Helm uninstall failed: uninstall: Release not loaded: ingress-nginx-public: release: not found
root cause was I think absent HR dependency for ingress-nginx-public HR (prometheus for creating service monitor and rules), but while flux suspend|resume usually works on HR stuck in install retries exhausted I was not able to fix Helm uninstall failed up until I've restarted all flux pods: kustomize, helm, source controllers. although around the same time I've updated HR with dependency so might've been that instead.
What I've noticed is that there were zero traces of helm release: no resources or helm secrets. I've also tried to flux delete hr and then reconciling upstream kustomization to force whole HR recreation - but it was always stuck in Helm uninstall failed status. To me it looked like flux somehow recorded somewhere that this HR is installed and needs to be uninstalled and then failing for obvious reasons.
Will be doing same bootstrapping new cluster soon, will check if this happens again.
flux: v0.24.1
helm-controller: v0.14.1
image-automation-controller: v0.18.0
image-reflector-controller: v0.14.0
kustomize-controller: v0.18.2
notification-controller: v0.19.0
source-controller: v0.19.2
Reproduced with incorrect values for helm chart, so helm template was failing locally but HR status was Helm uninstall failed: uninstall: Release not loaded
I suspect its because of my HR config:
spec:
install:
remediation:
retries: 3
Makes sense - controller tries to install helm chart, fails, tries to uninstall as a remediation step, fails to uninstall since it's a first install, ends up with last failure message.