smembers call in a loop
Hello,
we were overloading memorystore (redis on GCP) instance and it seems it might be caused by SMEMBERS call in kombu. We use celery[pytest]==5.3.4 and kombu==5.3.2, but I believe it was happening on celery==5.3.1 and kombu==5.3.1.
We are running periodic worker like this:
command: [celery]
args: [-A, kw.tasks.schedule, worker, -Q, periodic_py3]
and async worker like this:
command: [celery]
args: [-A, kw.tasks.worker_tasks.app, worker, -Q, async_py3]
We noticed peaks in SMEMBERS calls when releasing both of these. This is what it looks like in GCP, ~900 calls per minute (UTC +2):
Redis command latency in datadog (UTC):
Redis command count didn't change significantly and we don't use
SMEMBERS in our code.
This was traceback we saw in our logs, which I believe are related:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 1012, in _deliver
callback = self._callbacks[queue]
KeyError: 'cb11f1ed-02a5-3023-8b76-0a98642d2dba.reply.celery.pidbox'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 785, in _do_restore_message
for queue in self._lookup(exchange, routing_key):
File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 722, in _lookup
self.get_table(exchange),
File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 1064, in get_table
values = client.smembers(key)
File "/usr/local/lib/python3.10/site-packages/redis/commands/core.py", line 3399, in smembers
return self.execute_command("SMEMBERS", name)
File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/redis/patch.py", line 125, in _traced_execute_command
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 508, in execute_command
return conn.retry.call_with_retry(
File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 509, in <lambda>
lambda: self._send_command_parse_response(
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 485, in _send_command_parse_response
return self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 525, in parse_response
response = connection.read_response()
File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 493, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/usr/local/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 124, in read_response
self.read_from_socket()
File "/usr/local/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 84, in read_from_socket
bufflen = self._sock.recv_into(self._buffer)
File "/app/kw/tasks/celery_health.py", line 33, in handler
raise HardTimeoutError
kw.tasks.celery_health.HardTimeoutError
In datadog, I can see traces of SMEMBERS like this:
When I look at span from redis, it's repeating same commands over and over again. I assume some variable changes and that is why it keeps executing same commands:
Within one request, it looks like this:
When I zoom in, I see this:
the trace shows this error on
ZREM HDEL call:
redis.exceptions.WatchError: Watched variable changed.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 1012, in _deliver
callback = self._callbacks[queue]
KeyError: '0e72a63e-eeb6-3cc3-8df5-ad87f9563746.reply.celery.pidbox'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/trace_utils_redis.py", line 87, in _trace_redis_execute_pipeline
yield span
File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/redis/patch.py", line 156, in _traced_execute_pipeline
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1431, in execute
return conn.retry.call_with_retry(
File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1432, in <lambda>
lambda: execute(conn, stack, raise_on_error),
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1311, in _execute_transaction
raise WatchError("Watched variable changed.")
redis.exceptions.WatchError: Watched variable changed.
Is this behaviour expected or it's a bug?
I'm not sure it is a redis client bug or kombu!