[BUG] unset CPU limit is overridden with CPU request
Describe the bug
A container task with only CPU requests but no limits set, still gets limits applied:
from flytekit import ContainerTask, kwtypes, workflow, Resources
hello_task = ContainerTask(
name="hello",
image="ubuntu:20.04",
requests=Resources(cpu="2", mem="1Gi"),
limits=Resources(mem="2Gi"),
command=["echo", "hello"]
)
Running this task results in a pod with cpu limit also set to 2 (same as request), but there should be no cpu limit.
Expected behavior
If no CPU limit is specified, it is also not set implicitly and no CPU limit is applied in k8s.
So I propose to only copy the requests to limits as a whole if limits are completely unset instead of filling missing limits with the respective requests values.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
We have just encountered this today also; in our case request is being set at the task level:
Here is the schema from flytectl's point of view; flytectl get workflow -p flytesnacks -d development test_workflow -o json --version test_version:
... snipped to the section of interest...
"resources": {
"requests": [
{
"name": "CPU",
"value": "64"
},
{
"name": "MEMORY",
"value": "950Gi"
}
]
}
pod spec:
resources:
limits:
cpu: "64"
memory: 950Gi
requests:
cpu: "64"
memory: 950Gi
For more context: In many cases setting CPU limits (not requests) results in unused CPU due to throttling, see also https://home.robusta.dev/blog/stop-using-cpu-limits.
Need to explore this further, but Flyte does this because of k8s pod QoS. If the requests are different than the limits then k8s frequently preempts the pods to schedule others. By setting them the same, k8s will not prematurely delete the pod to schedule another.
We were wondering if it is the right decision to set it by default instead of letting the user decide.
In our case most tasks are scheduled on exclusive nodes. Without manually overriding the limits, currently users would need to get very close to the maximum CPU and memory request of the node. Without this, their Pod will be throttled and see OOM earlier than necessary.
We currently work around this by automatically overriding the limits on every task.
We were wondering if it is the right decision to set it by default instead of letting the user decide.
My 2 cents here is that letting the user decide, rather than enforcing an unintuitive behavior, is never a bad choice.
Sure there can be a default behavior, but the user should be given the option to leave the limits unset (unlimited) depending on the use case. And, as @flixr pointed out, there are good reasons to do it sometimes.
"Hello 👋, this issue has been inactive for over 90 days. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏"
Bump, still a very annoying issue that prevents us from fully utilizing our cluster!
We are thinking of removing this limit. There is a noisy neighbor problem that happens and then causes non deterministic failures. Cc @EngHabu
"Hello 👋, this issue has been inactive for over 90 days. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏"
Hello 👋, this issue has been inactive for over 90 days and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏