Limit Safemode List Digest

Open Azoam opened this issue 3 years ago • 1 comments

What does this PR do?

[ ] Adds new functionality
[ ] Alters existing functionality
[X] Fixes a bug
[ ] Improves documentation or testing

Please briefly describe your changes as well as the motivation behind them:

We found that the safemode feature which checks cluster information to determine blast radius was taking up a lot of memory and OOMing pods. This PR is a potential fix to this, by limiting how many objects are digested at a time, hopefully not filling the memory up.

In regards the branch name consisting of etcd - This was the original idea - to use etcd for the object count look up, but this fell through and now we are using the limiting functionality of the client api.

Code Quality Checklist

[X] The documentation is up to date.
[X] My code is sufficiently commented and passes continuous integration checks.
[X] I have signed my commit (see Contributing Docs).

Testing

From the initial sight of this bug, the injector pods were OOMing/timing out. I tested this branch in an environment with 10k+ pods and no OOMing or Timing out was occurring.

Jul 21 '22 18:07 Azoam

I had hoped there was a way to query the API for just the total count, not a full list, but seems thats lacking

From my understanding and research, there is a way to query, but it requires specific configuration on a kubernetes environment which everyone may not have. Also those who do have it configured may have it locked up behind administrator permissions. So it seems like a safe bet to use the generic client that should be generally accessible.

Aug 02 '22 17:08 Azoam