kombu icon indicating copy to clipboard operation
kombu copied to clipboard

smembers call in a loop

Open lucas03 opened this issue 2 years ago • 1 comments

Hello, we were overloading memorystore (redis on GCP) instance and it seems it might be caused by SMEMBERS call in kombu. We use celery[pytest]==5.3.4 and kombu==5.3.2, but I believe it was happening on celery==5.3.1 and kombu==5.3.1.

We are running periodic worker like this:

        command: [celery]
        args: [-A, kw.tasks.schedule, worker, -Q, periodic_py3]

and async worker like this:

        command: [celery]
        args: [-A, kw.tasks.worker_tasks.app, worker, -Q, async_py3]

We noticed peaks in SMEMBERS calls when releasing both of these. This is what it looks like in GCP, ~900 calls per minute (UTC +2): image Redis command latency in datadog (UTC): image Redis command count didn't change significantly and we don't use SMEMBERS in our code. This was traceback we saw in our logs, which I believe are related:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 1012, in _deliver
    callback = self._callbacks[queue]
KeyError: 'cb11f1ed-02a5-3023-8b76-0a98642d2dba.reply.celery.pidbox'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 785, in _do_restore_message
    for queue in self._lookup(exchange, routing_key):
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 722, in _lookup
    self.get_table(exchange),
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 1064, in get_table
    values = client.smembers(key)
  File "/usr/local/lib/python3.10/site-packages/redis/commands/core.py", line 3399, in smembers
    return self.execute_command("SMEMBERS", name)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/redis/patch.py", line 125, in _traced_execute_command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 508, in execute_command
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 509, in <lambda>
    lambda: self._send_command_parse_response(
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 485, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 525, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 493, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
  File "/usr/local/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 124, in read_response
    self.read_from_socket()
  File "/usr/local/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 84, in read_from_socket
    bufflen = self._sock.recv_into(self._buffer)
  File "/app/kw/tasks/celery_health.py", line 33, in handler
    raise HardTimeoutError
kw.tasks.celery_health.HardTimeoutError

In datadog, I can see traces of SMEMBERS like this: image When I look at span from redis, it's repeating same commands over and over again. I assume some variable changes and that is why it keeps executing same commands: image Within one request, it looks like this: image When I zoom in, I see this: image the trace shows this error on ZREM HDEL call:

redis.exceptions.WatchError: Watched variable changed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 1012, in _deliver
    callback = self._callbacks[queue]
KeyError: '0e72a63e-eeb6-3cc3-8df5-ad87f9563746.reply.celery.pidbox'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/trace_utils_redis.py", line 87, in _trace_redis_execute_pipeline
    yield span
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/redis/patch.py", line 156, in _traced_execute_pipeline
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1431, in execute
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1432, in <lambda>
    lambda: execute(conn, stack, raise_on_error),
  File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1311, in _execute_transaction
    raise WatchError("Watched variable changed.")
redis.exceptions.WatchError: Watched variable changed.

Is this behaviour expected or it's a bug?

lucas03 avatar Sep 07 '23 14:09 lucas03

I'm not sure it is a redis client bug or kombu!

auvipy avatar Sep 09 '23 08:09 auvipy