limiter icon indicating copy to clipboard operation
limiter copied to clipboard

Potential race between Redis key TTL and Context.Reset?

Open Xopherus opened this issue 5 years ago • 4 comments

I'm seeing what seems to be a race condition where the limiter key is still present in the Redis store after the code waits for Context.Reset to be reached. This causes the next iteration of the code to also be rate-limited, but Context.Reset is already past, so the sleep duration is negative.

Been seeing this with the following:

  • ulele/limiter v3.5.0
  • redis store (redis 5.0.8)

Here's a snippet of the code which should be able to reproduce it:

for {
    limit, err := r.Limiter.Get(ctx, key)
    if err != nil {
	log.Printf("[ERROR] failed to fetch rate-limit context %s", err)
	return err
    }

    if limit.Reached {
        sleep := time.Until(time.Unix(limit.Reset, 0))

        log.Printf("[ERROR] client has proactively throttled for %s", sleep.String())
        <-time.After(sleep)
        continue
    }
    
    // do stuff 
}

And here are the logs:

2020/04/07 19:41:41 [DEBUG] ratelimit context: {Limit:2 Remaining:0 Reset:1586288531 Reached:true}
2020/04/07 19:41:41 [ERROR] client has proactively throttled for 29.954364124s
2020/04/07 19:42:11 [DEBUG] ratelimit context: {Limit:2 Remaining:0 Reset:1586288531 Reached:true}
2020/04/07 19:42:11 [ERROR] client has proactively throttled for -81.245359ms

Can anyone else reproduce this?

Xopherus avatar Apr 07 '20 20:04 Xopherus

Hello,

Can you give me a gist with a full example on how you initialize the store, what period do you use, what code do you execute after do stuff because you've logged something in débug and I don't see it in your snippet. And also, what topology of redis do you use (simple, sentinel, cluster?)

Thank you.

novln avatar Apr 07 '20 22:04 novln

Here's what I could come up with @novln. I was able to reproduce with rates of 1-S and 1-M. I've also attached the code + go.mod/go.sum files.

https://gist.github.com/Xopherus/7423f7b2b60264b94bb9dd5c9ebe9c7d

Xopherus avatar Apr 18 '20 15:04 Xopherus

Ok I'm able to reproduce that, thank you for the gist.

I think the issue is how we compute the Reset value and how expiration works on redis. For a small fixes (for the time being) add one second to the Reset value. I'll try to find some time to investigate that issue further.

Cheers,

novln avatar May 22 '20 16:05 novln

Because the expiration is somewhat imprecise between the PTTL redis command, the conversion from this duration to a concrete time.Time and then, the conversion to unix timestamp, we loose some milliseconds of precision.

I won't fix this issue (for the moment) because it means a refactoring that I don't want to manage. My recommendation: If the sleep duration is below zero, change it to 500 ms to force a next tick.

With that being said, I'll gladly review pull request.

Cheers,

novln avatar May 22 '20 17:05 novln