Helm upgrade failed suddenly. Start showing it has not deployed release and hr reconcile failed
When doing reconciliation of helm release, sometimes it starts showing "Helm upgrade failed: project has no deployed releases." It's happening once in 2-3 days.
Q: Is helm controller restarted? A: No
kubectl -n flux-system get pods helm-controller-6b456768d5-mcwcc
NAME READY STATUS RESTARTS AGE
helm-controller-6b456768d5-mcwcc 1/1 Running 0 2d5h
project = fe-stack Error:
flux get hr fe-stack -n qa-team
NAME READY MESSAGE REVISION SUSPENDED
fe-stack False Helm upgrade failed: "fe-stack" has no deployed releases False
kubectl describe helmreleases.helm.toolkit.fluxcd.io fe-stack -n qa-team
Last Helm logs:
preparing upgrade for fe-stack
resetting values to the chart's original version
performing update for fe-stack
creating upgraded release for fe-stack
Reason: UpgradeFailed
Status: False
Type: Released
Failures: 1
Helm Chart: qa-team/qa-team-fe-stack
Last Attempted Revision: v5.0.27
Last Attempted Values Checksum: 6a7c1ebdbb475c995c093799e1f3824926229e71
Last Handled Reconcile At: 2021-05-07T17:54:54.795085788+05:30
Observed Generation: 22
Upgrade Failures: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal error 19m (x19 over 120m) helm-controller reconciliation failed: Helm upgrade failed: "fe-stack" has no deployed releases
Normal info 9m53s (x30 over 29h) helm-controller Helm upgrade has started
Normal error 9m47s (x21 over 120m) helm-controller Helm upgrade failed: "fe-stack" has no deployed releases
Logs with debug level: hr-controller.log
Hi @stefanprodan this is a production blocker for us to go live with fluxV2. I can also contribute if that helps.
Thank You
I suspect this to be related to / another version of #149, with a rich history of helm users themselves running into it:
https://github.com/helm/helm/issues/5595 https://github.com/helm/helm/issues/7160
Judging on the shared logs, there seems to be a correlation with the --wait behavior from Helm, that seems to corrupt the storage in some edge cases.
It's happening a lot of time during development. Is there are any workaround without deleting the helmrelease?
Its a manual activity but this helped the stuck release: https://github.com/helm/helm/issues/5595#issuecomment-717024123