Producer thread error loop on connection ssl handshake error
When the SSL handshake fails with an OSError, the producer thread keeps retrying to connect with the same connection resulting in the same error occurring again. The workaround is to restart the stuck producers.
This is with version 2.0.1, though it seems the relevant code has not changed since.
ERROR:kafka.producer.sender:Uncaught error in kafka producer I/O thread
OSError: [Errno 0] Error
self._sslobj.do_handshake()
File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake
self._sock.do_handshake()
File "/usr/local/lib/python3.6/site-packages/kafka/conn.py", line 505, in _try_handshake
if self._try_handshake():
File "/usr/local/lib/python3.6/site-packages/kafka/conn.py", line 426, in connect
conn.connect()
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 390, in _maybe_connect
self._maybe_connect(node_id)
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 580, in poll
self._client.poll(timeout_ms=poll_timeout_ms)
File "/usr/local/lib/python3.6/site-packages/kafka/producer/sender.py", line 160, in run_once
self.run_once()
File "/usr/local/lib/python3.6/site-packages/kafka/producer/sender.py", line 60, in run
Perhaps OSError should close the connection here to prevent error looping: https://github.com/dpkp/kafka-python/blob/7ac6c6e29099ccba4d50f5b842972dd7332d0e58/kafka/conn.py#L513
Newer versions of Python might have been fixed not to produce OSError in this case: https://bugs.python.org/issue31122
Though it seems to me it would be best if kafka-python guarded against this situation.
Just found https://github.com/dpkp/kafka-python/pull/2100 that fixes this issue.