helm-controller Add ability to disable caching secrets

Fixes #512 .

By default, controller-runtime gives you a cache-backed client, which will perform a LIST operation on any API type that you fetch, even if you use a GET request like we do: https://github.com/fluxcd/helm-controller/blob/5a7d0f8535406d155ed7d0634b27574538febed0/controllers/helmrelease_controller.go#L556

So unless you restrict helm-controller to a specific namespace with --watch-all-namespaces=false, then it will attempt to populate its cache by listing all secrets in the cluster, even if it only needs to access a few in specific namespaces. That requires giving helm-controller the ability to read every secret in the cluster.

This PR allows users to avoid that restriction by disabling caching of secrets. Now users can deploy HelmReleases that use valuesFrom without having to grant Flux access to more secrets than necessary.

Aug 08 '22 19:08 mac-chaffee

Overall, I do see the benefits of offering as an option the capability of decreasing the RBAC requirements in which the controller runs under. As discussed here, by default, even when impersonation is in place, the controller service account still requires list access at cluster-level.

There are some considerations worth taking around the impact of this. But giving Flux users the option to decide based on their priorities and security vs performance trade-offs, does not seem like a terrible idea.

I am just not sure about the API. Potentially, users may want to disable cache for other objects in the future, so --disable-cache-for may provide more flexibility. But, I would be keen on hearing other maintainers take on this.

Aug 12 '22 14:08 pjbgf

@pjbgf I was thinking the same thing originally about --disable-cache-for, but then we'd have to parse strings like "core/v1/Secrets" into Kubernetes objects, which didn't seem straightforward to me (although I am a weak golang coder 😅 )

If you know of a helper function that does the job, I'd be happy to include it!

Aug 12 '22 22:08 mac-chaffee

Hello again! Just following up on this. Given the complexity of parsing API types from strings and the lack of a defined use-case for disabling caches for things other than secrets, I think we ought to stick with just --no-cache-secrets

Dec 12 '22 15:12 mac-chaffee

@mac-chaffee after some thought on this and discussions with @darkowlzz, we think that ConfigMaps and Secrets could be tied in together, as this can lower the running memory requirements, on top of the RBAC requirements benefits you are looking for.

This has been done in CAPI a while ago and should work well for us too.

Therefore, can you please:

Bundle both ConfigMaps and Secrets as part of this PR.
Add an opt-out Feature Gate instead of a flag: CacheSecretsAndConfigMaps. Source controller is a good example for feature gates in Flux: https://github.com/fluxcd/source-controller/blob/885c9f2cba44bb8a9130ef839a1dfcf0bead2c94/internal/features/features.go#L35-L37.
Rebase the PR.

Dec 13 '22 14:12 pjbgf

Sounds good. To clarify, should CacheSecretsAndConfigMaps be set to true by default? or false?

Dec 13 '22 18:12 mac-chaffee

To clarify, should CacheSecretsAndConfigMaps be set to true by default? or false?

false by default looks good, this will help us understand the impact more quickly whilst also providing a way out for users. I am running a few tests with this version. Can you confirm the RBAC needs aligns with the expectations (as per linked issue)?

Dec 14 '22 20:12 pjbgf

Can you confirm the RBAC needs aligns with the expectations (as per linked issue)?

Yes I ran it locally and was unpleasantly surprised to see it connect to my production cluster, but at least I was able to confirm the RBAC was working with the flag enabled/disabled!

Dec 14 '22 20:12 mac-chaffee

Tested this against my setup, all works as expected. In terms of memory footprint, I don't have many secrets and configmaps in the cluster (~105 objects), but it already makes a dent. At a larger cluster this should be even more substantial:

	Cache Enabled	Cache Disabled
Peak	112MiB	66MiB
Valley	43MiB	17MiB

Dec 14 '22 21:12 pjbgf

I really like this addition thanks @mac-chaffee. I agree that this should be added to the other controllers behind feature gate.

Dec 16 '22 11:12 souleb

To sort the issue in the e2e tests, please run make fmt vet. Issue with the fuzz tests is sorted by rebasing.

Dec 16 '22 11:12 hiddeco

Oh wait I forgot the docs. Seems we don't have docs for the existing CLI flags, so I can add that too. Thinking the most logical place to put that would be https://github.com/fluxcd/helm-controller/blob/main/docs/spec/README.md instead of the page on helmreleases since spec/README is linked from the top-level README as "controller".

Dec 16 '22 21:12 mac-chaffee

@mac-chaffee we do have flag documentation, but they're maintained in the options.md files in the directories found here: https://github.com/fluxcd/website/tree/main/content/en/flux/components

Dec 19 '22 08:12 hiddeco

Is it really fine to have it false by default? This changes the controller behaviour so I would and I wonder what is the impact on a cluster where everything is deployed with Flux.

Dec 19 '22 14:12 souleb

@mac-chaffee we do have flag documentation, but they're maintained in the options.md files in the directories found here: https://github.com/fluxcd/website/tree/main/content/en/flux/components

Ah thanks. I removed the docs from this PR (we don't want two sources of truth for the cli flags) and will update fluxcd/website later

Dec 19 '22 14:12 mac-chaffee

Is it really fine to have it false by default? This changes the controller behaviour so I would and I wonder what is the impact on a cluster where everything is deployed with Flux.

Think based on the (realistic) assumption that we only need a tiny subset of Secrets in any cluster makes this a fair choice. Helm itself maintains its own (extremely expensive) cache anyway, so in terms of API request overhead this should be negligible.

Dec 19 '22 15:12 hiddeco