Add ability to disable caching secrets
Fixes #512 .
By default, controller-runtime gives you a cache-backed client, which will perform a LIST operation on any API type that you fetch, even if you use a GET request like we do: https://github.com/fluxcd/helm-controller/blob/5a7d0f8535406d155ed7d0634b27574538febed0/controllers/helmrelease_controller.go#L556
So unless you restrict helm-controller to a specific namespace with --watch-all-namespaces=false, then it will attempt to populate its cache by listing all secrets in the cluster, even if it only needs to access a few in specific namespaces. That requires giving helm-controller the ability to read every secret in the cluster.
This PR allows users to avoid that restriction by disabling caching of secrets. Now users can deploy HelmReleases that use valuesFrom without having to grant Flux access to more secrets than necessary.
Overall, I do see the benefits of offering as an option the capability of decreasing the RBAC requirements in which the controller runs under. As discussed here, by default, even when impersonation is in place, the controller service account still requires list access at cluster-level.
There are some considerations worth taking around the impact of this. But giving Flux users the option to decide based on their priorities and security vs performance trade-offs, does not seem like a terrible idea.
I am just not sure about the API. Potentially, users may want to disable cache for other objects in the future, so --disable-cache-for may provide more flexibility. But, I would be keen on hearing other maintainers take on this.
@pjbgf I was thinking the same thing originally about --disable-cache-for, but then we'd have to parse strings like "core/v1/Secrets" into Kubernetes objects, which didn't seem straightforward to me (although I am a weak golang coder 😅 )
If you know of a helper function that does the job, I'd be happy to include it!
Hello again! Just following up on this. Given the complexity of parsing API types from strings and the lack of a defined use-case for disabling caches for things other than secrets, I think we ought to stick with just --no-cache-secrets
@mac-chaffee after some thought on this and discussions with @darkowlzz, we think that ConfigMaps and Secrets could be tied in together, as this can lower the running memory requirements, on top of the RBAC requirements benefits you are looking for.
This has been done in CAPI a while ago and should work well for us too.
Therefore, can you please:
- Bundle both
ConfigMapsandSecretsas part of this PR. - Add an opt-out Feature Gate instead of a flag:
CacheSecretsAndConfigMaps. Source controller is a good example for feature gates in Flux: https://github.com/fluxcd/source-controller/blob/885c9f2cba44bb8a9130ef839a1dfcf0bead2c94/internal/features/features.go#L35-L37. - Rebase the PR.
Sounds good. To clarify, should CacheSecretsAndConfigMaps be set to true by default? or false?
To clarify, should CacheSecretsAndConfigMaps be set to true by default? or false?
false by default looks good, this will help us understand the impact more quickly whilst also providing a way out for users.
I am running a few tests with this version. Can you confirm the RBAC needs aligns with the expectations (as per linked issue)?
Can you confirm the RBAC needs aligns with the expectations (as per linked issue)?
Yes I ran it locally and was unpleasantly surprised to see it connect to my production cluster, but at least I was able to confirm the RBAC was working with the flag enabled/disabled!
Tested this against my setup, all works as expected. In terms of memory footprint, I don't have many secrets and configmaps in the cluster (~105 objects), but it already makes a dent. At a larger cluster this should be even more substantial:
| Cache Enabled | Cache Disabled | |
|---|---|---|
| Peak | 112MiB | 66MiB |
| Valley | 43MiB | 17MiB |
I really like this addition thanks @mac-chaffee. I agree that this should be added to the other controllers behind feature gate.
To sort the issue in the e2e tests, please run make fmt vet. Issue with the fuzz tests is sorted by rebasing.
Oh wait I forgot the docs. Seems we don't have docs for the existing CLI flags, so I can add that too. Thinking the most logical place to put that would be https://github.com/fluxcd/helm-controller/blob/main/docs/spec/README.md instead of the page on helmreleases since spec/README is linked from the top-level README as "controller".
@mac-chaffee we do have flag documentation, but they're maintained in the options.md files in the directories found here: https://github.com/fluxcd/website/tree/main/content/en/flux/components
Is it really fine to have it false by default? This changes the controller behaviour so I would and I wonder what is the impact on a cluster where everything is deployed with Flux.
@mac-chaffee we do have flag documentation, but they're maintained in the
options.mdfiles in the directories found here: https://github.com/fluxcd/website/tree/main/content/en/flux/components
Ah thanks. I removed the docs from this PR (we don't want two sources of truth for the cli flags) and will update fluxcd/website later
Is it really fine to have it
falseby default? This changes the controller behaviour so I would and I wonder what is the impact on a cluster where everything is deployed with Flux.
Think based on the (realistic) assumption that we only need a tiny subset of Secrets in any cluster makes this a fair choice. Helm itself maintains its own (extremely expensive) cache anyway, so in terms of API request overhead this should be negligible.