opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Audit for possible recursion error in logging code

Open xrmx opened this issue 1 year ago • 3 comments

What happened?

We recently fixed a recursion error in logging (#4255) switching a call from logging to warnings. We need to audit for this kind of issues to other part of the logging code. We want to also evaluate if warnings is really the right abastraction because of integrations with the logging module that may lead to the same issue https://docs.python.org/3/library/logging.html#integration-with-the-warnings-module

Would you like to implement a fix?

None

xrmx avatar Nov 08 '24 16:11 xrmx

To document another scenario where this happens is if the OTLP endpoint is misconfigured (unknown host), you can get the expected intermittent transient errors like:

[2025-05-21 21:39:32,087] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1987568 135592897660480]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting traces to foo:4317, retrying in 1s.
[2025-05-21 21:39:32,101] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1987568 135592880875072]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting logs to foo:4317, retrying in 1s.
[2025-05-21 21:39:34,170] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1988113 139409655584320]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting logs to foo:4317, retrying in 1s.
[2025-05-21 21:39:34,171] [WARNING] in [opentelemetry.exporter.otlp.proto.grpc.exporter 1988113 139409672369728]: Transient error StatusCode.DEADLINE_EXCEEDED encountered while exporting traces to foo:4317, retrying in 1s.

But when pressing CTRL+C (KeyboardInterrupt) to shut down the application, you can end up in an infinite loop as well like:

[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
[2025-05-21 21:39:43,240] [INFO] in [opentelemetry.sdk._logs._internal.export 1987568 135592880875072]: Shutdown called, ignoring log.
Exception in thread OtelBatchLogRecordProcessor:
Traceback (most recent call last):
  File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 299, in _export
    self._client.Export(
  File ".../lib/python3.10/site-packages/grpc/_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File ".../lib/python3.10/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
    logger.emit(self._translate(record))
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
    self._multi_log_record_processor.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
    lp.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
    _logger.info("Shutdown called, ignoring log.")
  File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
    logger.emit(self._translate(record))
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
    self._multi_log_record_processor.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
    lp.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
    _logger.info("Shutdown called, ignoring log.")
  File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
    logger.emit(self._translate(record))
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
    self._multi_log_record_processor.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2025-05-21T21:39:43.107572264+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    lp.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 273, in _export
    _logger.info("Shutdown called, ignoring log.")
  File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
    self._exporter.export(
  File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 111, in export
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    return self._export(batch)
  File ".../lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 328, in _export
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    logger.warning(
  File "/usr/lib/python3.10/logging/__init__.py", line 1489, in warning
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.emit(record)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    logger.emit(self._translate(record))
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
    self.emit(record)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 569, in emit
    self._multi_log_record_processor.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
    lp.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 291, in emit
    logger.emit(self._translate(record))
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 610, in emit
    _logger.info("Shutdown called, ignoring log.")
  File "/usr/lib/python3.10/logging/__init__.py", line 1477, in info
    self._multi_log_record_processor.emit(log_data)
  File ".../lib/python3.10/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 330, in emit
    self._log(INFO, msg, args, **kwargs)

  ...
  RecursionError: maximum recursion depth exceeded while calling a Python object

devmonkey22 avatar May 21 '25 21:05 devmonkey22

I think logging.Handler.handleError might be appropriate and pythonic way to handle this for SimpleLogRecordProcessor. It is intended to be called within custom handlers emit() function to avoid recursion, and we should probably do so in OTel's LoggingHandler.

One problem with the current Simple log processor is that it isn't letting the exception bubble back up to LoggingHandler and instead tries to handle it itself by calling logger.exception() code. The recursions looks like

logging.info()...
    OTel LoggingHandler.emit()
    |    OTel Logger.emit()
    |        OTel SimpleLogRecordProcessor.on_emit()
    |            OTel Exporter.emit()
    |                raise FooError()
    |        <-------|                       
    |        logger.exception()
    |--------|

We can't really let the exception propagate back through the OTel Logger.emit call because we never raise exceptions from the API. We could stick a flag into context from OTel LoggingHandler that signals the SimpleLogRecordProcessor to allow the exception to bubble back up? Or stick a lambda into context to call LoggingHandler.handleError(log_record) further down the stack and return immediately.

I'll try to send a PR for this.

aabmass avatar Oct 24 '25 03:10 aabmass

Sent out https://github.com/open-telemetry/opentelemetry-python/pull/4799

DylanRussell avatar Nov 07 '25 21:11 DylanRussell