Add a new metrics to expose local cached items size
Backgroud
In controller-runtime, the default behaviour is to cache all data using client.Get unless using a read-only client or custom cache policies, e.g., ClientDisableCacheFor, NewCache. Therefore, using client.Get on a specific pod in a large cluster with many pods may lead to high memory usage.
Proposal Add a new metric to expose the local cached items' size. Then we can analyze the high memory usage caused by the misuse cache policy.
#3054
@halfcrazy, looks like this will be done by https://github.com/kubernetes-sigs/controller-runtime/issues/3202
@halfcrazy, looks like this will be done by #3202
I'm afraid #3202 cannot resolve this issue. AFAIK, until https://github.com/kubernetes/kubernetes/pull/129160 is merged. We need to register the informer metric too.
/cc @sbueringer
Yeah, we're going to wait for this to be implemented in k/k. Then we can probably pick it up (if it's safe from a metric cardinality point of view)
Thanks @sbueringer ! Could you help review this PR https://github.com/kubernetes/kubernetes/pull/129160? It would be great to keep things moving forward.
Huge backlog unfortunately at the moment. I'll try to get to it, but I can't promise it.
@sbueringer Thank you.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale