Memory leak when RateLimitOverride is set with uniq ip on every call to ShouldRateLimit method
In our service we use envoyproxy/ratelimit as sidecar and are making grpc calls to ShouldRateLimit method explicitly without envoy proxy.
Simplified config looks like
---
domain: app
descriptors:
- key: ip
rate_limit:
unit: minute
requests_per_unit: 1000
In our request we specify Limit (limit override) - https://github.com/envoyproxy/java-control-plane/blob/main/api/src/main/proto/envoy/extensions/common/ratelimit/v3/ratelimit.proto#L93 on every request which lead to this codepath getting executed - https://github.com/envoyproxy/ratelimit/blob/main/src/config/config_impl.go#L252-L261
In that if statement when unique IP is getting used on every request we are getting unique rateLimitKey and also as a result are getting new set of stats Counters this.statsManager.NewStats(rateLimitKey),. So number of stats Counters and keys always grows over time.
- Stats should not be collected at all if
USE_STATSDis set tofalse. - Stats should expire and stats cache should get cleaned up over time
UPDATE:
our simplified ratelimit request to looks like this:
{
"domain": "app",
"descriptors": [
{
"entries": {
"key": "our_entry_key",
"value": "customer1234_1.2.3.4"
},
"limit": {
"requests_per_unit": 1000,
"unit": "MINUTE"
}
}
],
"hits_addend": 1
}
^ note 2 things
- we have
limitdefined in each request and is different based on customer. In this case customer iscustomer1234 - key is built by
<customer_identifier>_<ip>
Because limit is defined this code path is getting executed
https://github.com/envoyproxy/ratelimit/blob/main/src/config/config_impl.go#L252-L261
which executes this.statsManager.NewStats(rateLimitKey) which will build whole bunch of new stats counters based on key (customer1234_1.2.3.4)
At one point we did not collect stats on a per key basis for this very reason. Has this regressed? Or do you have per-key/detailed stats enabled?
It looks like regression. There is no settings (settings.go) like ENABLE_PER_KEY_STATS but it would be great to have one.
Hmm, now I'm confused and I don't remember the history. I was thinking about https://github.com/envoyproxy/ratelimit/issues/181 which is not yet implemented. Is this a regression from https://github.com/envoyproxy/ratelimit/pull/242? We definitely should not be making a new metric for every key/value pair and we need to figure out when that was introduced and revert it.
issue has been introduced here in https://github.com/envoyproxy/ratelimit/pull/158
GetLimits --> NewRateLimit --> newRateLimitStats
OK, thanks for tracking down. cc @Pchelolo.
We definitely should not be creating a stat per key/value by default. @zdmytriv are you interested in fixing? "Detailed stats" should be opt-in only per https://github.com/envoyproxy/ratelimit/issues/181.
cc @ysawa0
are you interested in fixing?
I'm new to GO so probably not the best candidate