Envoy HTTP Local rate Limit Inconsistent Token available with load
We currently implement an envoy filter using Istio for HTTP Local rate limiting, We had se the rate limit at 100 request per second which I acknowledge is fairly low. Across 1 Gateway pod so we expected to see around ~100 RPS or 6,000rpm allowed. , as we saw our traffic climb up to 5,000 RPS or 300,000rpm. We saw the tokens available go up to around 300RPS or 19,000 rpm.
With that said some key points I would like to include we are using descriptor paths to set our health checks at a higher point but since we aren't calling there I do not think that should matter. As well for the test we had it in a count mode with the filter not enforced. Auto scaling was also turned off for this load test. CPU at the time was set to 46%
Looking to see if anyone has any explanation for this behavior. Since it worked perfectly fine up until 2,000 rps
cc @wbpcode who has been looking at this recently
could you provide a configuration about your test?
spec:
configPatches:
- applyTo: HTTP_FILTER
match:
context: GATEWAY
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.local_ratelimit
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
- applyTo: HTTP_ROUTE
match:
context: GATEWAY
patch:
operation: MERGE
value:
route:
rate_limits:
- actions:
- remote_address: {}
- actions:
- header_value_match:
descriptor_value: healthcheck
expect_match: false
headers:
- name: :path
string_match:
ignore_case: true
prefix: /healthcheck/
typed_per_filter_config:
envoy.filters.http.local_ratelimit:
'@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
always_consume_default_token_bucket: false
descriptors:
- entries:
- key: header_match
value: healthcheck
token_bucket:
fill_interval: 1s
max_tokens: 100
tokens_per_fill: 100
enable_x_ratelimit_headers: DRAFT_VERSION_03
filter_enabled:
default_value:
denominator: HUNDRED
numerator: 100
runtime_key: http_local_rate_limiter
filter_enforced:
default_value:
denominator: HUNDRED
numerator: 0
runtime_key: http_local_rate_limiter
response_headers_to_add:
- append: false
header:
key: x-local-rate-limit
value: "true"
stat_prefix: http_local_rate_limiter
token_bucket:
fill_interval: 1s
max_tokens: 1100
tokens_per_fill: 1100
workloadSelector:
labels:
app.kubernetes.io/component: istio-ingressgateway
This was the configuration for the test
cc @cam634 The configuration seems fine. From the configuration, the requests with /healthcheck/ path will be limited to 100/s and other request will be limited to 1100/s? Right?
Are you ensure the test requests have the expected prefix? Could you also give an example?
(PS: the current local rate limit use a timer based token bucket which sometime are inaccurate. But you results still be weird because the difference is too big)
Yeah that's exactly what we are trying to do,
Are you ensure the test requests have the expected prefix? Could you also give an example?
This was an original thought of mine but something that debunked that for me was we hold at the right number that should be rate limited until about 2,000 RPS. We were calling a path called /ping.
I have not looked in the current code but could the filter possibly be overwhelmed at certain rates. Since it doesn't directly correlate with RPS coming in that it's leaking up.
We are going to test at more realistic number but am trying to understand why it would happen at the lower levels even
Is there anything else you may recommend looking into CC: @wbpcode
@cam634 actually, I have no other idea about this problem if you actually disable the HPA. I am working a new token bucket solution which could provide more stable result. But it still need some more time.
In the case some else stumbles upon this we tried to retest with a higher limit & also higher time frame separate times. We found the higher limit at 1 second completely didn't work. Then after increasing the time frame we realized at 150,000 rpm or 2,500rps each time we saw this issue arise. So we confirmed after that rate we see the limit double we did confirm we saw no HPA activity.
Followup found the reason. Turns out it was by the host it is currently setup on our ingress with the gateway and ex we have one route name test1.com & test2.com and each one of them have their own limit can we have a combined limit ? CC: @wbpcode
Figured it outs because the HTTP-ROUTE which distributes the changes to the services envoy proxies instead of taking effect at the istio gateway. So each service has there own rate limit.