Exponential backoff on Redis connection and Redis connection errors handling
When our application tries to connect to Redis and encounters an error, it doesn't employ any back-off strategy. Continuously retrying without delays can put an unnecessary load on both the application and the Redis server, potentially exacerbating the problem.
Proposed Solution:
-
Exponential Backoff: Implement an exponential backoff mechanism that progressively increases the time between retry attempts. This will give transient issues (e.g., network glitches) a chance to self-resolve and reduce the load on our systems.
- Start with a short delay, e.g., 2 seconds.
- Double the delay with each subsequent attempt.
- Cap the delay at a reasonable maximum, e.g., 5 minutes.
-
Improved Error Handling: Classify the connection errors to determine if a retry would be beneficial.
- If it's a transient error (e.g., network hiccup), retry with backoff.
- If it's a persistent error (e.g., authentication failure), log the error and alert the admins without retrying endlessly.
-
Logging & Monitoring: Enhance the logging mechanism to capture detailed information about the nature of the Redis connection failures. This will be instrumental for debugging and monitoring purposes.
Implementation Details:
- Introduce a function
retryWithBackoff(), that wraps the Redis connection logic and implements the exponential backoff. - Use Go's error wrapping to provide more context on where and why the connection failed.
- Ensure that critical failures (e.g., authentication issues) are prominently logged and can trigger alerts.
Seems like the go-redis package itself decides whether to retry or not depending on the error received from Redis - https://github.com/redis/go-redis/blob/v8.11.5/error.go#L28. So maybe we could use this same logic if possible (we would need to copy this function since it is not exported).
By default, the retry-limit for such errors is set to 3 (This can be modified in the redis.Options struct while creating the client - https://pkg.go.dev/github.com/go-redis/redis/[email protected]#Options). If we need to do exponential backoff, I guess we could set the retry-limit to 1 here and write our custom logic for exponential-backoff. So it would look something like this roughly.
func (r *RedisAdapter) retryWithBackoff(f func(context.Context, ...interface{}), args ...interface) {
result, err := f(args...)
if (shouldRetry(err)) { // logic borrowed from go-redis
// retry 'n' times
}
}
func (r *RedisAdapter) readMessagesFromQueue(ctx context.Context) ([]adapter.WebhookPayload, error) {
/* BEFORE
entries, err := r.client.XRead(ctx, &redis.XReadArgs{
Streams: []string{r.queueName, r.lastID},
Count: 5,
}).Result()
*/
r.retryWithBackoff(
func(ctx context.Context, readArgs) { return r.client.XRead(ctx, readArgs).Result() },
ctx,
&redis.XReadArgs{
Streams: []string{r.queueName, r.lastID},
Count: 5,
}
)
}