envoy Envoy HTTP Local rate Limit Inconsistent Token available with load

We currently implement an envoy filter using Istio for HTTP Local rate limiting, We had se the rate limit at 100 request per second which I acknowledge is fairly low. Across 1 Gateway pod so we expected to see around ~100 RPS or 6,000rpm allowed. , as we saw our traffic climb up to 5,000 RPS or 300,000rpm. We saw the tokens available go up to around 300RPS or 19,000 rpm.

With that said some key points I would like to include we are using descriptor paths to set our health checks at a higher point but since we aren't calling there I do not think that should matter. As well for the test we had it in a count mode with the filter not enforced. Auto scaling was also turned off for this load test. CPU at the time was set to 46%

Looking to see if anyone has any explanation for this behavior. Since it worked perfectly fine up until 2,000 rps

Jun 25 '24 12:06 cam634

cc @wbpcode who has been looking at this recently

Jun 25 '24 19:06 mattklein123

could you provide a configuration about your test?

Jun 26 '24 00:06 wbpcode

spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
    patch:
      operation: INSERT_BEFORE
      value:
        name: envoy.filters.http.local_ratelimit
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
          value:
            stat_prefix: http_local_rate_limiter
  - applyTo: HTTP_ROUTE
    match:
      context: GATEWAY
    patch:
      operation: MERGE
      value:
        route:
          rate_limits:
          - actions:
            - remote_address: {}
          - actions:
            - header_value_match:
                descriptor_value: healthcheck
                expect_match: false
                headers:
                - name: :path
                  string_match:
                    ignore_case: true
                    prefix: /healthcheck/
        typed_per_filter_config:
          envoy.filters.http.local_ratelimit:
            '@type': type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            always_consume_default_token_bucket: false
            descriptors:
            - entries:
              - key: header_match
                value: healthcheck
              token_bucket:
                fill_interval: 1s
                max_tokens: 100
                tokens_per_fill: 100
            enable_x_ratelimit_headers: DRAFT_VERSION_03
            filter_enabled:
              default_value:
                denominator: HUNDRED
                numerator: 100
              runtime_key: http_local_rate_limiter
            filter_enforced:
              default_value:
                denominator: HUNDRED
                numerator: 0
              runtime_key: http_local_rate_limiter
            response_headers_to_add:
            - append: false
              header:
                key: x-local-rate-limit
                value: "true"
            stat_prefix: http_local_rate_limiter
            token_bucket:
              fill_interval: 1s
              max_tokens: 1100
              tokens_per_fill: 1100
  workloadSelector:
    labels:
      app.kubernetes.io/component: istio-ingressgateway

This was the configuration for the test

Jun 26 '24 11:06 cam634

cc @cam634 The configuration seems fine. From the configuration, the requests with /healthcheck/ path will be limited to 100/s and other request will be limited to 1100/s? Right?

Are you ensure the test requests have the expected prefix? Could you also give an example?

(PS: the current local rate limit use a timer based token bucket which sometime are inaccurate. But you results still be weird because the difference is too big)

Jun 26 '24 13:06 wbpcode

Yeah that's exactly what we are trying to do,

Are you ensure the test requests have the expected prefix? Could you also give an example?

This was an original thought of mine but something that debunked that for me was we hold at the right number that should be rate limited until about 2,000 RPS. We were calling a path called /ping.

I have not looked in the current code but could the filter possibly be overwhelmed at certain rates. Since it doesn't directly correlate with RPS coming in that it's leaking up.

We are going to test at more realistic number but am trying to understand why it would happen at the lower levels even

Jun 26 '24 14:06 cam634

Is there anything else you may recommend looking into CC: @wbpcode

Jul 01 '24 15:07 cam634

@cam634 actually, I have no other idea about this problem if you actually disable the HPA. I am working a new token bucket solution which could provide more stable result. But it still need some more time.

Jul 05 '24 05:07 wbpcode

In the case some else stumbles upon this we tried to retest with a higher limit & also higher time frame separate times. We found the higher limit at 1 second completely didn't work. Then after increasing the time frame we realized at 150,000 rpm or 2,500rps each time we saw this issue arise. So we confirmed after that rate we see the limit double we did confirm we saw no HPA activity.

Jul 09 '24 12:07 cam634

Followup found the reason. Turns out it was by the host it is currently setup on our ingress with the gateway and ex we have one route name test1.com & test2.com and each one of them have their own limit can we have a combined limit ? CC: @wbpcode

Jul 19 '24 18:07 cam634

Figured it outs because the HTTP-ROUTE which distributes the changes to the services envoy proxies instead of taking effect at the istio gateway. So each service has there own rate limit.

Jul 21 '24 19:07 cam634