Why does krr always recommend unsetting CPU limit?
This issue doesn't necessarily relate to a feature request or a bug, so I did not use the template for these.
This might be somewhat of an opinionated question, and I understand the reasoning for this. However when reporting on the % of CPU resources used by a container, we need to know the upper boundary (CPU limit) so that we can know if it is appropriately sized.
If we remove the limit, then we can't calculate a percentage of CPU used.
I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.
I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.
This is based on this article
Still while I agree that there are some usecases, I am not sure what should be recommended as a limit for CPU (like what the formula should be. So if you know what it should be you can maybe propose it (or any other solution)
Also you are always able to fork the code and to add your own strategy (KRR currently have only a simple strategy, but it is built for being easily extendible)
I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.
This is based on this article
Still while I agree that there are some usecases, I am not sure what should be recommended as a limit for CPU (like what the formula should be. So if you know what it should be you can maybe propose it (or any other solution)
Dave Chiluk from Indeed suggests targeting between 0% and 10% throttling in his 2019 Kubecon talk: https://youtu.be/UE7QX98-kO0?t=2265
This is, assumedly, for the average throttling over the lifetime of the container. Obviously people should be able to use their own custom logic still if they have unique patterns, but it'd be possible to calculate the average usage for a container and then calculate how much CPU likely would have gotten you to a target throttling percentage. It goes without saying a more robust formula should be considered that can account for a wider range of reasonable usage patterns, but I think a flat usage shape recommendation based purely on average CPU usage is a good place to start.
I believe we should target 3%, if 0-10% throttling is the target for efficient applications that are good cluster neighbors, then we should err on the side of performance in that range and suggest a cpu limit that would theoretically result in an average throttling of 2%.
That'd be an interesting formula to come up with, but I think it's possible.
Also you are always able to fork the code and to add your own strategy (KRR currently have only a simple strategy, but it is built for being easily extendible)
If it's not possible yet, I would love to be able to pass a strategy definition via command line so I can commit our project's strategy to the repo it's being used in! This would help us codify application expectations, and I think would allow a lot of added value.
@sd-matt-b we're open to PRs to support anything you need. We'd rather have the strategy in the codebase (assuming its something you can contribute) and read in settings from a yaml file in your project repo.
Maybe for start there could be option to stop reporting CPU limits as something bad and stop lowering overall grade if it is detected? I personally think that this whole "stop using CPU limits" trend is just complete nonsense. Various issues where found in schedulers in the past but these were issues. They needed to be fixed and they have been fixed here and there over time. Stopping using CPU limits at all is just unreasonable. Thus encouraging to do so is also a bit problematic.