sendhooks-engine icon indicating copy to clipboard operation
sendhooks-engine copied to clipboard

Exponential backoff on Redis connection and Redis connection errors handling

Open koladev32 opened this issue 2 years ago • 1 comments

When our application tries to connect to Redis and encounters an error, it doesn't employ any back-off strategy. Continuously retrying without delays can put an unnecessary load on both the application and the Redis server, potentially exacerbating the problem.

Proposed Solution:

  1. Exponential Backoff: Implement an exponential backoff mechanism that progressively increases the time between retry attempts. This will give transient issues (e.g., network glitches) a chance to self-resolve and reduce the load on our systems.

    • Start with a short delay, e.g., 2 seconds.
    • Double the delay with each subsequent attempt.
    • Cap the delay at a reasonable maximum, e.g., 5 minutes.
  2. Improved Error Handling: Classify the connection errors to determine if a retry would be beneficial.

    • If it's a transient error (e.g., network hiccup), retry with backoff.
    • If it's a persistent error (e.g., authentication failure), log the error and alert the admins without retrying endlessly.
  3. Logging & Monitoring: Enhance the logging mechanism to capture detailed information about the nature of the Redis connection failures. This will be instrumental for debugging and monitoring purposes.

Implementation Details:

  • Introduce a function retryWithBackoff(), that wraps the Redis connection logic and implements the exponential backoff.
  • Use Go's error wrapping to provide more context on where and why the connection failed.
  • Ensure that critical failures (e.g., authentication issues) are prominently logged and can trigger alerts.

koladev32 avatar Sep 09 '23 05:09 koladev32

Seems like the go-redis package itself decides whether to retry or not depending on the error received from Redis - https://github.com/redis/go-redis/blob/v8.11.5/error.go#L28. So maybe we could use this same logic if possible (we would need to copy this function since it is not exported). By default, the retry-limit for such errors is set to 3 (This can be modified in the redis.Options struct while creating the client - https://pkg.go.dev/github.com/go-redis/redis/[email protected]#Options). If we need to do exponential backoff, I guess we could set the retry-limit to 1 here and write our custom logic for exponential-backoff. So it would look something like this roughly.

func (r *RedisAdapter) retryWithBackoff(f func(context.Context, ...interface{}), args ...interface) {
      result, err := f(args...)
      if (shouldRetry(err)) { // logic borrowed from go-redis
           // retry 'n' times
      }
}

func (r *RedisAdapter) readMessagesFromQueue(ctx context.Context) ([]adapter.WebhookPayload, error) {
	/* BEFORE
        entries, err := r.client.XRead(ctx, &redis.XReadArgs{
		Streams: []string{r.queueName, r.lastID},
		Count:   5,
	}).Result()
       */
      r.retryWithBackoff(
            func(ctx context.Context, readArgs) { return r.client.XRead(ctx, readArgs).Result() },
            ctx,
            &redis.XReadArgs{
                  Streams: []string{r.queueName, r.lastID},
                  Count:   5,
            }
      )
}

gokullan avatar Nov 01 '24 18:11 gokullan