Add per-host connection limits to DestinationRule
Add per-host connection limits to DestinationRule
Description
This PR adds support for per-host connection limits in the API by introducing a new perHostLimits field to ConnectionPoolSettings. This feature aligns with Envoy's per_host_thresholds capability for circuit breakers.
Motivation
Currently, Istio's DestinationRule only allows setting a global connection limit for the entire cluster, independent of the number of endpoints. This makes it difficult to properly manage concurrency for destination services, especially in autoscaling scenarios where the number of replicas changes dynamically.
Per-host connection limits allow controlling connections to each individual endpoint, which:
- Prevents overload of individual hosts
- Manages concurrency properly in autoscaling scenarios
- Maintains healthy connection limits without hitting overloaded applications
The underlying Envoy cluster circuit breaker already has this per_host_thresholds capability. Other Envoy-based tools (Envoy Gateway, Contour) have already added this support.
Example Usage
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: reviews-per-host-limits
spec:
host: reviews.prod.svc.cluster.local
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
perHostLimits:
tcp:
maxConnections: 10
Limitations
Currently only the maxConnections field is supported for per-host limits, as per Envoy's circuit breaker implementation. Other fields in TCPSettings will be ignored.
Testing
Generated CRD files successfully with make gen Test case not included due to CRD validation cost limits (will be validated in istio/istio integration)
Fixes #57697
🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?
Courtesy of your friendly test nag.
@ramaraochavali, @howardjohn , @therealmitchconnors , @keithmattix can you please share your thoughts when you get sometime.
IMO PerHostThresholds make sense for DFP type of cluster where each host can point to a different upstream service/external endpoint but not very useful for regular clusters. The cluster load balancing algorithm will almost ensure requests are sent to hosts that can handle them with low latency(Least Request) still ensuring the cluster level circuit breakers are honored to prevent from cascading failures . Adaptive concurrency is a better choice to handle auto scaling behaviors. Also PerHostThresholds only support max connections. Even with connections under the limit, the backend host can be overwhelmed with lot of requests in some protocols like HTTP/2.
Thank you for the thoughtful review, I agree with several of the points raised and appreciate the perspective.
I agree that per-host thresholds are most obviously useful for DFP-style clusters, where individual endpoints may represent very different upstream services. I also agree that for regular service clusters, Envoy’s Least Request load balancing and cluster-level circuit breakers already do a good job of steering traffic away from overloaded endpoints and preventing cascading failures.
That said, I believe per-host connection limits still provide complementary value, even for regular Kubernetes service clusters, particularly in the following cases:
Proactive protection vs reactive steering
Least Request and outlier detection are reactive mechanisms, they rely on latency or failures that have already occurred. Per-host connection limits provide a hard, proactive guardrail that prevents an endpoint from being overwhelmed in the first place, especially during sudden traffic spikes or cold-start scenarios.
Autoscaling lag and uneven capacity
In autoscaling environments, new pods often start with:
- cold caches
- reduced CPU availability
- delayed readiness at the application layer
During these windows, per-host connection limits help ensure that a newly added or temporarily degraded pod is not immediately saturated, even if it technically passes readiness checks.
Connection pressure still matters, even with HTTP/2
I agree that per_host_thresholds currently only support max_connections, and that for protocols like HTTP/2 a single connection can still carry many concurrent streams. However, in practice:
- connection establishment itself is not free (TLS, memory, file descriptors)
- many applications still exhibit degradation under high connection churn or concurrent streams
- limiting connections can still meaningfully reduce pressure on constrained backends
While not a complete concurrency control solution, per-host limits act as a coarse but effective safety mechanism.
Parity with Envoy and ecosystem tooling
Since Envoy already supports per_host_thresholds, and other Envoy-based projects (Envoy Gateway, Contour) have exposed this capability, providing it in Istio improves configuration parity and reduces surprises for users migrating between tools or using Istio in heterogeneous environments.
I fully agree that adaptive concurrency control is a superior long-term solution for handling autoscaling and dynamic load, and I see per-host connection limits as complementary rather than competing with that direction. Adaptive concurrency addresses request-level saturation, while per-host connection limits provide a simpler, deterministic control at the connection level.
If the concern is around:
- encouraging misuse,
- limited effectiveness for HTTP/2,
- or unclear guidance for users,
I would be happy to:
- clearly document the intended use cases and limitations,
- scope the feature explicitly as an advanced / opt-in capability,
- or align the API more closely with Envoy semantics to avoid confusion.
I am happy to go in either direction, just wanted to add my point of view.
Adaptive concurrency is a better choice to handle auto scaling behaviors. Also PerHostThresholds only support max connections. Even with connections under the limit, the backend host can be overwhelmed with lot of requests in some protocols like HTTP/2.
I somewhat agree @ramaraochavali, even though I am really keen on (at least) getting some per host connection limit in (I opened the issue https://github.com/istio/istio/issues/57697 after all). Yes, e.g. HTTP/2 streams are an amplification in regards to concurrency and it would be awesome if the request queue could be limited per number of hosts (or multiples thereof). So if I run with 3 backends I might allow for 100 pending requests, but when scaling up to 30 I might also want to increase the request queue to 1000 accordingly.
This kind of "adaptive concurrency" as suggested in https://github.com/istio/istio/issues/25991 and is supported by Envoy via https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/adaptive_concurrency_filter. @prashanthjos has already stated in https://github.com/istio/istio/issues/25991#issuecomment-3663439070 he might also take a look at that one. It's a bigger change though.
But in the end it all comes down to managing the effects of Little's Law. Higher latency very quickly causes a concurrency explosion, causing the latency per request to go higher and higher until there is timeouts of the backend cannot take it anymore (if it does not implement some limit itself).
Only using and trusting on Outlier Detection makes managing concurrency a reactive task (there have to be traffic affecting issues already for Envoy to take action), even though Envoy has capabilities in regarding to maintaining a healthy flow of requests without static configuration of connection counts or even rate limits (which never hit the right number).
To me these capabilities are
a) adaptive LB algos like PEWMA (https://github.com/envoyproxy/envoy/issues/20907 -> https://github.com/istio/proxy/pull/6690, or Prequal (https://github.com/envoyproxy/envoy/issues/42091). Which send requests to the backend likely resulting in the lowest latency and when looking again at Little's Law, server the user best (low latency) while also maximizing overall rps (at the same concurrency).
b) adaptive concurrency to not overwhelm individual backends with connections / requests as suggested in https://github.com/istio/istio/issues/25991
. proactive guardrail that prevents an endpoint from being overwhelmed in the first place, especially during sudden traffic spikes or cold-start scenarios.
These type of issues are better solved by warm-up configuration than circuit breakers
connection establishment itself is not free (TLS, memory, file descriptors)
This is true whether you do it at cluster level or at host level. But practically for HTTP2, Envoy would not create more than one connection per worker thread unless max_concurrent_streams is reached. Even high throughput apps wont create more connections in Http2.
Only using and trusting on Outlier Detection makes managing concurrency a reactive task (there have to be traffic affecting issues already for Envoy to take action),
I am not advocating outlier detection to handle this. I am just saying static per host connection limits wont help much.
@ramaraochavali I agree that, in practice, this change is unlikely to have a significant impact for most standard Kubernetes service clusters, especially given Envoy’s existing load-balancing behavior and cluster-level circuit breakers. That said, I also don’t see any real downside or risk in exposing this capability, since it is already supported by Envoy and aligns Istio with the broader Envoy ecosystem.
While per-host connection limits may not materially improve concurrency management in many HTTP/2-heavy workloads, they can still serve as a simple, deterministic guardrail in certain scenarios and provide configuration parity with tools like Envoy Gateway and Contour. As long as the feature is clearly documented with its limitations and positioned as an advanced or niche control rather than a primary concurrency mechanism, I don’t think it introduces confusion or harm.
Given that, I’m comfortable with this moving forward and would defer to the broader perspective and judgment of @ramaraochavali, @howardjohn, @therealmitchconnors, and @keithmattix for the final decision.
Please do let me know.
One additional point worth calling out is that per-host connection limits are significantly more valuable for pure TCP workloads (e.g., Redis, DBs, Kafka) than for HTTP/2 services. For these protocols, connections map directly to backend resource consumption (threads/event loops, memory, file descriptors). There is no request multiplexing or adaptive concurrency signal, so limiting connections at the per-host level is one of the few effective and generic guardrails available to prevent individual endpoints from being overwhelmed, especially during bursts, restarts, or scaling events. Many of the concerns around limited effectiveness (HTTP/2 stream amplification, Envoy connection reuse, etc.) do not apply in this case. Adaptive concurrency and advanced LB algorithms are excellent directions, but they primarily address request-level saturation and are not applicable to raw TCP protocols.
Given that Envoy already supports per_host_thresholds, and similar tooling has exposed this capability, adding this to Istio seems reasonable as a complementary, opt-in mechanism, particularly for TCP services, provided the limitations are clearly documented.
@keithmattix is ok with the change, @howardjohn @therealmitchconnors can you please leave your thoughts so that we get an alignment on the PR.
The bar for adding new fields to a stable API is high... is there a reason this cannot be achieved through EnvoyFilters or some other ProxyConfig? I think we'd need strong user demand to add this to DestinationRule.
The bar for adding new fields to a stable API is high... is there a reason this cannot be achieved through EnvoyFilters or some other ProxyConfig? I think we'd need strong user demand to add this to DestinationRule.
I really like to have managed concurrency as stated before. The current state of a limit unrelated to the number of replicas does not make (much) sense.
In the end, everything can be achieved through EnvoyFilters. But they, for very good reasons, come with a clear warning every time. It's just not a good approach to go about one's traffic management to free solo Envoy filters left and right.
But, agreeably it makes no sense to simply expose everything Envoy could do, but to be a well curated abstraction on top.
your call
+1 to what @therealmitchconnors said
I really like to have managed concurrency as stated before. The current state of a limit unrelated to the number of replicas does not make (much) sense.
I agree with the desired goal but I still do not think per-host thresholds is the right answer for it. As you correctly mentioned in some other thread/context, Adaptive concurrency solves it better and if we want to to change API, I would like to go in that direction keeping how gateway API also evolves in that direction.
I agree with the desired goal but I still do not think per-host thresholds is the right answer for it.
But the current, "unrelated to host count" thresholds are even less helpful, even confusing to people writing DestinationRules expecting this to be per target / replica.
And it's not that per-host thresholds don't help at all ... yes, it's a connection limit, not a RIF (request in flight) or stream limit. But since we are talking in-mesh, we kinda know if we are dealing with HTTP 1.1 or 2 and I could easily do the math (connection count * HTTP stream limit).
Thanks everyone for the detailed discussion and thoughtful feedback on this PR. I appreciate the time and perspectives shared so far.
This PR has been open for a little over a month now, and we’ve had several rounds of discussion around scope, applicability, and API evolution. To avoid leaving this in a prolonged state of uncertainty, I think it would be helpful to converge on a clear decision.
From my understanding so far:
@keithmattix is OK with the change.
@ramaraochavali has raised concerns about introducing per-host connection limits and prefers a different direction (which I interpret as a NOT OK for this PR as it stands).
@howardjohn may be unavailable at the moment, so I’m not assuming a position there.
To help us arrive at a clear outcome, could @therealmitchconnors please indicate an explicit OK / NOT OK on this PR? That would allow us to determine whether there is sufficient alignment to move forward, or whether we should close or re-scope this work rather than leaving it hanging.
Happy to follow whatever decision the group converges on, the goal here is simply to reach clarity.
Thanks again.