Audit for possible recursion error in logging code
What happened?
We recently fixed a recursion error in logging (#4255) switching a call from logging to warnings. We need to audit for this kind of issues to other part of the logging code. We want to also evaluate if warnings is really the right abastraction because of integrations with the logging module that may lead to the same issue https://docs.python.org/3/library/logging.html#integration-with-the-warnings-module
Would you like to implement a fix?
None
To document another scenario where this happens is if the OTLP endpoint is misconfigured (unknown host), you can get the expected intermittent transient errors like:
[2025-05-21 21:39:32,087] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1987568 135592897660480]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting traces to foo:4317, retrying in 1s.
[2025-05-21 21:39:32,101] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1987568 135592880875072]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting logs to foo:4317, retrying in 1s.
[2025-05-21 21:39:34,170] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1988113 139409655584320]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting logs to foo:4317, retrying in 1s.
[2025-05-21 21:39:34,171] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1988113 139409672369728]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting traces to foo:4317, retrying in 1s.
But when pressing CTRL+C (KeyboardInterrupt) to shut down the application, you can end up in an infinite loop as well like:
[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
Exception in thread OtelBatchLogRecordProcessor:
Traceback (most recent call last):
File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 299, in _export
self._client.Export(
File ".../lib/python3.10/site-packages/grpc/_channel.py", line 1181, in __call__
return _end_unary_response_blocking(state, call, False, None)
File ".../lib/python3.10/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
logger.emit(self._translate(record))
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
self._multi_log_record_processor.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
lp.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
_logger.info("Shutdown called, ignoring log.")
File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
logger.emit(self._translate(record))
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
self._multi_log_record_processor.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
lp.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
_logger.info("Shutdown called, ignoring log.")
File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
logger.emit(self._translate(record))
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
self._multi_log_record_processor.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2025-05-21T21:39:43.107572264+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
lp.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 273, in _export
_logger.info("Shutdown called, ignoring log.")
File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
self._exporter.export(
File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 111, in export
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
return self._export(batch)
File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 328, in _export
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
logger.warning(
File "/usr/lib/python3.10/logging/__init__.py", line 1489, in warning
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
self._log(WARNING, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.emit(record)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
logger.emit(self._translate(record))
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
self.emit(record)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
self._multi_log_record_processor.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
lp.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
logger.emit(self._translate(record))
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
_logger.info("Shutdown called, ignoring log.")
File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
self._multi_log_record_processor.emit(log_data)
File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
self._log(INFO, msg, args, **kwargs)
...
RecursionError: maximum recursion depth exceeded while calling a Python object
I think logging.Handler.handleError might be appropriate and pythonic way to handle this for SimpleLogRecordProcessor. It is intended to be called within custom handlers emit() function to avoid recursion, and we should probably do so in OTel's LoggingHandler.
One problem with the current Simple log processor is that it isn't letting the exception bubble back up to LoggingHandler and instead tries to handle it itself by calling logger.exception() code. The recursions looks like
logging.info()...
OTel LoggingHandler.emit()
| OTel Logger.emit()
| OTel SimpleLogRecordProcessor.on_emit()
| OTel Exporter.emit()
| raise FooError()
| <-------|
| logger.exception()
|--------|
We can't really let the exception propagate back through the OTel Logger.emit call because we never raise exceptions from the API. We could stick a flag into context from OTel LoggingHandler that signals the SimpleLogRecordProcessor to allow the exception to bubble back up? Or stick a lambda into context to call LoggingHandler.handleError(log_record) further down the stack and return immediately.
I'll try to send a PR for this.
Sent out https://github.com/open-telemetry/opentelemetry-python/pull/4799