[BUG] ZeroMQ Transport Error on Timeout/Exception
Description
When sending a message via the ZeroMQ transport in 3006.5, a flurry of ERROR logs are seen if the request times out or an exception occurs during execution.
user@salt-minion:/# salt-call test.ping -lerror
11:54:03.919 [salt.minion ]: 284 [ 2127] [ERROR ] No master could be reached or all masters denied the minion's connection attempt.
11:54:03.919 [tornado.application]: 284 [ 2127] [ERROR ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921de480>, <salt.ext.tornado.concurrent.Future object at 0xffff95340e10>)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 603, in _send_recv
recv = yield self.socket.recv()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
value = future.result()
^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
zmq.eventloop.future.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 605, in _send_recv
future.set_exception(exc)
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
ret = callback()
^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
future.result()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
future.set_exception(exc)
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
11:54:03.923 [tornado.application]: 284 [ 2127] [ERROR ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921dede0>, <salt.ext.tornado.concurrent.Future object at 0xffff922119d0>)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 601, in _send_recv
yield self.socket.send(message)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
ret = callback()
^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
future.result()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1070, in run
yielded = self.gen.send(value)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
future.set_exception(exc)
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
11:54:03.925 [tornado.application]: 284 [ 2127] [ERROR ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921df6a0>, <salt.ext.tornado.concurrent.Future object at 0xffff92212650>)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 601, in _send_recv
yield self.socket.send(message)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
ret = callback()
^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
future.result()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1070, in run
yielded = self.gen.send(value)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
future.set_exception(exc)
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error
These messages are not indicative of a functional issue - the response is received by the Salt Minion and it returns as expected (note the last line displaying the clear "timeout error") - but are part of the completion of the coroutine that the message sending/receiving now runs in.
The root cause of these messages is the fact that the error handling code does not check if the future is done (meaning it will have already yielded to the calling code, hence the lack of impact, apart from logging) before attempting to set the exception.
Setup This was produced in a local environment with a code change to reliably replicate the situation:
diff --git a/salt/channel/server.py b/salt/channel/server.py
index b6d51fef08..17527fde83 100644
--- a/salt/channel/server.py
+++ b/salt/channel/server.py
@@ -9,6 +9,7 @@ import hashlib
import logging
import os
import shutil
+import time
import salt.crypt
import salt.ext.tornado.gen
@@ -149,6 +150,7 @@ class ReqServerChannel:
# intercept the "_auth" commands, since the main daemon shouldn't know
# anything about our key auth
if payload["enc"] == "clear" and payload.get("load", {}).get("cmd") == "_auth":
+ time.sleep(30)
raise salt.ext.tornado.gen.Return(
self._auth(payload["load"], sign_messages)
)
I introduced a 30 second sleep in the channel/server implementation to artificially create a timeout when signing into the Salt Master.
Steps to Reproduce the behavior
- Salt Master with the above code change
- Salt Minion configured with
auth_timeout: 1(this is not necessary, just speeds up demonstration of the issue)
Expected behavior
When an exception occurs but the future is already done (meaning the calling process received a response - whatever it may be), the coroutine does not attempt to complete that future again and end up dumping a bunch of spurious ERROR logs.
Versions Report
Note This report was taken with code at the v3006.5 tag (0472fd381e) but shows up as 3007.0 below for some
salt --versions-report
reason.Salt Version:
Salt: 3007.0
Python Version:
Python: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0]
Dependency Versions:
cffi: 1.15.1
cherrypy: Not Installed
dateutil: 2.8.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.3
M2Crypto: 0.38.0
Mako: Not Installed
msgpack: 1.0.3
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.11.0
pygit2: Not Installed
python-gnupg: 0.4.9
PyYAML: 6.0
PyZMQ: 24.0.1
relenv: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: debian 12.2 bookworm
locale: utf-8
machine: aarch64
release: 6.4.16-linuxkit
system: Linux
version: Debian GNU/Linux 12.2 bookworm
noticing similar error trying to start minion on rocky9 vagrant instance
2024-02-06 14:54:34,285 [tornado.application:640 ][ERROR ][1134] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f88ee3d5820>, <salt.ext.tornado.concurrent.Future object at 0x7f88ee7f5ca0>)
Traceback (most recent call last):
File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 601, in _send_recv
yield self.socket.send(message)
AttributeError: 'NoneType' object has no attribute 'send'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
ret = callback()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
future.result()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/gen.py", line 1070, in run
yielded = self.gen.send(value)
File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 612, in _send_recv
future.set_exception(exc)
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
2024-02-06 14:54:34,288 [tornado.application:640 ][ERROR ][1134] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f88ee3dd040>, <salt.ext.tornado.concurrent.Future object at 0x7f88ee3d4d90>)
Traceback (most recent call last):
File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 601, in _send_recv
yield self.socket.send(message)
AttributeError: 'NoneType' object has no attribute 'send'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
ret = callback()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
future.result()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/gen.py", line 1070, in run
yielded = self.gen.send(value)
File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 612, in _send_recv
future.set_exception(exc)
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
self.set_exc_info(
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
self._set_done()
File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
root@min2 ~]# salt-minion --versions-report
Salt Version:
Salt: 3006.6
Python Version:
Python: 3.9.18 (main, Jan 4 2024, 00:00:00)
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.3
libgit2: Not Installed
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.2
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: 3.20.0
pygit2: Not Installed
python-gnupg: Not Installed
PyYAML: 6.0.1
PyZMQ: 25.1.2
relenv: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: rocky 9.3 Blue Onyx
locale: utf-8
machine: x86_64
release: 5.14.0-362.18.1.el9_3.x86_64
system: Linux
version: Rocky Linux 9.3 Blue Onyx
This was released in 3006.7