Initializer job name in Helm chart uses date which prevent application in ArgoCD from being in Sync
Hi,
I’m using the Helm chart version 1.6.153
Is your feature request related to a problem? Please describe
Because of the helper initializer.jobname, a new job could be created each minute.
This results in a permanent OutOfSync status of the defectdojo application within Argo CD, with a missing job ressource.
Describe the solution you'd like
One solution could be to use Helm hooks like pre-install and/or pre-upgrade and always use the same job name (thanks to hook deletion policies).
Additional context Helper causing the problem:
{{- define "initializer.jobname" -}}
{{ .Release.Name }}-initializer-{{- printf "%s" now | date "2006-01-02-15-04" -}}
{{- end -}}
Argo CD dashboard:
Thank you, @Brawdunoir, for pointing this out. I had planned to look closer at my similar/related issues, and you probably found the reason.
Unfortunately, pre- hooks would not work well for new installations. The job would be triggered before the database would be started. The database would not be started until the job was finished (as far as I know - we have been solving a similar issue with post- hooks).
Still not sure what is the best solution but "same name" sounds like a suitable solution.
I would like to check it deeper.
Sidenote: this is a side-effect of how Argo handles charts. As far as I know, there is no issue for "regular" helm deployments right now.
Thanks for the quick response! I have quite the experience of building charts and got into this job issue lately but without your chart dependencies constraint (we use Helmfile to deploy/order multiple charts).
Reusing the same job name without hooks will not work because a Job has a lot of immutable fields (including pod labels and container images). That's the reason behind the hook deletion policies.
It would be interesting to understand how other charts like Bitnami's are copping with this issue.
I suspect that because (in my knowledge) a proper mechanism from Helm itself does not exist to handle such situations, you could only rely on an init-container on your job. Your application will still be launched, without prior initialization of the database, but hey, that's life with Kubernetes, and your application should handle such cases programmatically (with checks and retries). And after maybe a few restarts because of readiness, at the end, everything should stabilize.
Using regular Helm deployment, are the previous jobs cleaned on each upgrade?
This issue has been solved in #11237 and released as part of 2.40.1. Sorry, I forgot to mention it here.
You can just set .initializer.staticName to true.
@Brawdunoir, does it work for you?
We’ve upgraded this morning and set the .initializer.staticName to true. However because of the default TTL of the job, it is deleted after 1 minute and thus the application is becoming OutOfSync agin in Argo CD because the job becomes missing (see screenshot below).
Setting .initializer.keepSeconds to 0 in addition to .initializer.staticName to true fixes the initial issue. Thanks for the work !
It’s nice that this value could be changed, but I don’t understand the keepSeconds default value to 60s.
@Brawdunoir, keepSeconds: 60s is there because of backward compatibility. For ArgoCD deployments, it does not make sense. But if you put there X, -1, 0 or null ... it will stay. See #11257
Plus I recommend
jobAnnotations:
argocd.argoproj.io/hook: "Sync"
Yes, it will trigger a job during each sync. But as far as I know, this is the only way to solve it without any problems for future updates - you would face an issue where you try to patch the image (because of a newer version of DD), which is a bit problematic. The hook just executes every time based on the latest definition.
Thanks again for your feedback. For my understanding, why Sync and not PreSync ?
Do we want to continue to update defectdojo if initializer job fails ?
Because the first deployment would trigger initialization before deployment of the database. So it would create a deadlock.
Plus, an update of DD containers will not happen before the initializer applies all needed migrations. This is thanks to initContainer db-migration-checker: https://github.com/DefectDojo/django-DefectDojo/blob/c0604918253b1ede40d316a07d5441a6f58f30e8/helm/defectdojo/templates/_helpers.tpl#L139-L145
But I'm open to any recommendations for improvements 😄
Indeed for the deadlock !
Putting PreSync with a different Wave on all database resources and on the initializer job would remove the need for this initContainer on DD containers when using Argo CD (or regular Helm hooks). However this seems overkill…
Keep it simple and just put Sync on Initializer Job is fine by me👍🏼