Exponential backoff on Redis connection and Redis connection errors handling

Open koladev32 opened this issue 2 years ago • 1 comments

When our application tries to connect to Redis and encounters an error, it doesn't employ any back-off strategy. Continuously retrying without delays can put an unnecessary load on both the application and the Redis server, potentially exacerbating the problem.

Proposed Solution:

Exponential Backoff: Implement an exponential backoff mechanism that progressively increases the time between retry attempts. This will give transient issues (e.g., network glitches) a chance to self-resolve and reduce the load on our systems.
- Start with a short delay, e.g., 2 seconds.
- Double the delay with each subsequent attempt.
- Cap the delay at a reasonable maximum, e.g., 5 minutes.
Improved Error Handling: Classify the connection errors to determine if a retry would be beneficial.
- If it's a transient error (e.g., network hiccup), retry with backoff.
- If it's a persistent error (e.g., authentication failure), log the error and alert the admins without retrying endlessly.
Logging & Monitoring: Enhance the logging mechanism to capture detailed information about the nature of the Redis connection failures. This will be instrumental for debugging and monitoring purposes.

Implementation Details:

Introduce a function retryWithBackoff(), that wraps the Redis connection logic and implements the exponential backoff.
Use Go's error wrapping to provide more context on where and why the connection failed.
Ensure that critical failures (e.g., authentication issues) are prominently logged and can trigger alerts.

Sep 09 '23 05:09 koladev32

Seems like the go-redis package itself decides whether to retry or not depending on the error received from Redis - https://github.com/redis/go-redis/blob/v8.11.5/error.go#L28. So maybe we could use this same logic if possible (we would need to copy this function since it is not exported). By default, the retry-limit for such errors is set to 3 (This can be modified in the redis.Options struct while creating the client - https://pkg.go.dev/github.com/go-redis/redis/[email protected]#Options). If we need to do exponential backoff, I guess we could set the retry-limit to 1 here and write our custom logic for exponential-backoff. So it would look something like this roughly.

func (r *RedisAdapter) retryWithBackoff(f func(context.Context, ...interface{}), args ...interface) {
      result, err := f(args...)
      if (shouldRetry(err)) { // logic borrowed from go-redis
           // retry 'n' times
      }
}

func (r *RedisAdapter) readMessagesFromQueue(ctx context.Context) ([]adapter.WebhookPayload, error) {
	/* BEFORE
        entries, err := r.client.XRead(ctx, &redis.XReadArgs{
		Streams: []string{r.queueName, r.lastID},
		Count:   5,
	}).Result()
       */
      r.retryWithBackoff(
            func(ctx context.Context, readArgs) { return r.client.XRead(ctx, readArgs).Result() },
            ctx,
            &redis.XReadArgs{
                  Streams: []string{r.queueName, r.lastID},
                  Count:   5,
            }
      )
}

Nov 01 '24 18:11 gokullan