Support queuing in the QueueProxy while using randomChoice2Policy in the activator
In what area(s)?
/area API
Other classifications: /kind spec
Describe the feature
We are very willing to contribute to this feature request
Background
The current behavior in Knative is:
- randomChoice2Policy can only be used when the hard limit is not set (code)
- the breaker (queue + semaphore) in the activator and QueueProxy will only be turned on when the hard limit is set
- when the activator and QueueProxy are both in the request path, and the hard limit is set (thus the breaker is turned on in both of them), requests will only be queued in the activator
What we want
We'd like to have this behavior:
- the breaker is turned on in QueueProxy and off in the activator (so the requests are always queued in QueueProxy)
- the activator uses randomChoice2Policy
Feature requests
There are multiple ways to achieve that:
- option 1: implement feature request 1 + 2 below
- option 2: implement feature request 3 below.
Feature request 1 (being able to turn off the breaker in the activator when hard limit is set):
Add a flag to the config-features ConfigMap such that when it's true, the breaker in the activator will be turned off regardless of the hard limit.
Feature request 2 (being able to override the lb policy in the activator):
Add a flag to the config-features ConfigMap to override the lb policy in the activator regardless of the hard limit. (Currently the lb policy strictly depends on ContainerConcurrency, which is the hard limit)
Feature request 3 (being able to turn on the breaker in QueueProxy when the soft limit is set):
Add a flag to the config-features ConfigMap such that when it's true, the breaker in QueueProxy will be turned on regardless of the hard limit.
Hopefully either option 1 or 2 above makes sense and we can get started on that. Regardless of the feature requests above, we'd like to have one more feature request to enable configuring the queue size in QueueProxy.
Feature request 4 (being able to set the queue size in QueueProxy): Currently the queue size in QueueProxy is hard-coded to be 10 times the hard limit (code). We propose to add a per-revision config to override the hard-coded queue size.
All of the 4 feature requests above will be guarded by new flags for backward compatibility.
Can you please explain the use case behind this request?
What stops you from removing activator from the request path using tbc=0?
Thanks @nader-ziada and @vagababov !
The use case is that our system has optimal performance with the randomChoice2Policy (this is also why we want to keep activator in the request path). But we only want the activator for its load balancing (not queuing) because we'd like to queue requests inside QueueProxy (and provision more memory for it) to avoid OOM'ing the activator. (OOM'ing the QueueProxy is fine in our case because the blast radius is limited)
I'm assuming we can't get randomChoice2Policy without the activator, right?
I'm assuming we can't get randomChoice2Policy without the activator, right?
you are correct, you need the activator to make that decision on where to send the request. The logic is based on the container concurrency.
containerConcurrency == 0 -> randomChoice2Policy
containerConcurrency <= 3 -> firstAvailableLBPolicy
default -> RoundRobinPolicy
Thanks @nader-ziada! Yes, the lb policy is based on the hard limit. Do you think either option 1 or 2 above makes sense? We'll start to work on the PR if so.
Feature request 4 (being able to set the queue size in QueueProxy): Currently the queue size in QueueProxy is hard-coded to be 10 times the hard limit (code). We propose to add a per-revision config to override the hard-coded queue size.
We currently have queueDepth := 10 * env.ContainerConcurrency, and container concurrency can be set on a per revision basis. So what you're proposing would be make the 10 configurable as well? (just want to make sure I'm understanding correctly)
We currently have queueDepth := 10 * env.ContainerConcurrency, and container concurrency can be set on a per revision basis. So what you're proposing would be make the 10 configurable as well? (just want to make sure I'm understanding correctly)
@psschwei Yes exactly! ContainerConcurrency also affects the autoscaling behavior and the size of the semaphore, so we think it's better to make the 10 configurable so we can adjust only the queue size.
The use case is that our system has optimal performance with the randomChoice2Policy (this is also why we want to keep activator in the request path). But we only want the activator for its load balancing
So to put it slightly differently, you're seeing better performance with the activator as load balancer (using randomChoice2Policy) than a standard Kubernetes service (which is what handles the routing when activator is not in the path). I wonder if there's a way to improve the Kubernetes routing performance, for example maybe using IPVS mode.
Not opposed to your proposal, just want to work through what the right spot to try to fix is...
So to put it slightly differently, you're seeing better performance with the activator as load balancer (using randomChoice2Policy) than a standard Kubernetes service (which is what handles the routing when activator is not in the path). I wonder if there's a way to improve the Kubernetes routing performance, for example maybe using IPVS mode.
@psschwei That makes sense! We can explore ways to improve the Kubernetes routing performance. Is it possible for you to accept our Feature Request 4 above first? We can continue the discussion around options 1 & 2 proposed above, while implementing the PR for Feature Request 4. That feature request could be helpful for people who like to set hard limits without using the activator. I can file another ticket (just for feature request 4) if that's easier for you.
I'm good with 4 (and don't think it needs a separate issue), but @dprotaso has the final call
@dprotaso could you please take a look at Feature Request 4 here? If it makes sense, I'll go ahead and implement it first.
option 4 being able to set the queue size in QueueProxy makes sense to me, but I believe there should be limits on it. The queue proxy is supposed to be a small buffer and if this got too big, it could have a negative impact on performance
@nader-ziada do you think capping the queue size at 100 * env.ContainerConcurrency makes sense?
So the proposed change is, at this line, set the queueDepth to X * env.ContainerConcurrency:
- X = 10 by default
- users can override X
- X is capped at 100 if user tries to set a value higher than 100
The queue-proxy is meant to be a small buffer for requests, allowing the Activator to have a clear signal that a request has been accepted for processing. So I worry that 100 might be too much.
@dprotaso @psschwei wdyt?
@nader-ziada @dprotaso @psschwei thanks for all the comments! Is it possible for us to get on a call with you guys? We just hope to facilitate the discussion. If you guys think this feature request makes sense, we'll start to implement it in Knative. Otherwise we'll think of another way (for example, we can develop our own "QueueProxy" inside the user container).
Hi @lichenran1234
I personally think this might cause performance issues in the queue-proxy if the buffer becomes too big, 100 feels like too much, but can't tell for sure without actual measurements.
Anyways, it would be great if you can join the core-api working group call on Wednesdays and discuss with everyone, more info about the working group here: https://github.com/knative/community/blob/619dba57d971e7751037d3f0e42c801540eb8142/working-groups/WORKING-GROUPS.md#api-core
+1 to discussing at the serving WG call
@nader-ziada do either the kperf or mako CI tests generate the appropriate data to benchmark the upper bound?
@nader-ziada @psschwei Thanks! We'll join the Serving & Networking WG call tomorrow morning at 9:30 PST
@dprotaso summary of our discussion in the Knative Serving Working Group meeting today:
Reasons we want to make the queue-proxy queue size configurable (currently it's hard coded to be 10 * container-concurrency):
- we would like to remove the activator from the request path to only queue the requests inside the queue-proxy. Because we are running different customer's workloads inside the same cluster and one customer's requests piling up in the activator might cause OOM issues on the activator, which will affect all other customers. Queuing in the queue-proxy limits the blast radius to a single customer's containers.
- we would like to queue more requests than the current hard-coded value because we have some latency-tolerant customers, who would like their requests to be queued longer rather than being rejected.
Other people had concerns on the performance if we increase the queue size, but @dprotaso thought it's fine because:
- we are not changing the default queue size
- it's the knative user's responsibility (rather than knative's) to make sure the queue size is set properly
Overall we are good to go on this feature request to make the queue-proxy queue size configurable.
@dprotaso also reminded us of the timeout limit in the queue-proxy (if the queue size is too big it's easier to hit that timeout), but we do have client-side timeout set, so we should be fine.
I had a fleeting thought that we might want to consider putting this change behind a gate (sort of like how we do with enable-scale-to-zero). I could see a scenario where some platform team might not want to give users the ability to grow the queue for fear of a similar growth in support tickets... though on second thought there's already plenty of ways for users to get into that kind of trouble, so I think gating would most likely be overkill here.
I could see a scenario where some platform team might not want to give users the ability to grow the queue for fear of a similar growth in support tickets
I was thinking about this earlier - I think it might makes sense to introduce this as an operator only knob.
ie. they configure a property in config-deployment config map
https://github.com/knative/serving/blob/main/config/core/configmaps/deployment.yaml
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.