controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Add a new metrics to expose local cached items size

Open halfcrazy opened this issue 10 months ago • 12 comments

Backgroud In controller-runtime, the default behaviour is to cache all data using client.Get unless using a read-only client or custom cache policies, e.g., ClientDisableCacheFor, NewCache. Therefore, using client.Get on a specific pod in a large cluster with many pods may lead to high memory usage.

Proposal Add a new metric to expose the local cached items' size. Then we can analyze the high memory usage caused by the misuse cache policy.

halfcrazy avatar Mar 28 '25 03:03 halfcrazy

#3054

halfcrazy avatar Apr 01 '25 01:04 halfcrazy

@halfcrazy, looks like this will be done by https://github.com/kubernetes-sigs/controller-runtime/issues/3202

krisztianfekete avatar May 09 '25 09:05 krisztianfekete

@halfcrazy, looks like this will be done by #3202

I'm afraid #3202 cannot resolve this issue. AFAIK, until https://github.com/kubernetes/kubernetes/pull/129160 is merged. We need to register the informer metric too.

halfcrazy avatar May 09 '25 10:05 halfcrazy

/cc @sbueringer

xigang avatar May 09 '25 11:05 xigang

Yeah, we're going to wait for this to be implemented in k/k. Then we can probably pick it up (if it's safe from a metric cardinality point of view)

sbueringer avatar May 09 '25 11:05 sbueringer

Thanks @sbueringer ! Could you help review this PR https://github.com/kubernetes/kubernetes/pull/129160? It would be great to keep things moving forward.

xigang avatar May 09 '25 11:05 xigang

Huge backlog unfortunately at the moment. I'll try to get to it, but I can't promise it.

sbueringer avatar May 09 '25 12:05 sbueringer

@sbueringer Thank you.

xigang avatar May 09 '25 13:05 xigang

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 07 '25 13:08 k8s-triage-robot

/remove-lifecycle stale

halfcrazy avatar Aug 07 '25 14:08 halfcrazy

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 05 '25 14:11 k8s-triage-robot

/remove-lifecycle stale

halfcrazy avatar Nov 06 '25 05:11 halfcrazy