opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

RecursionError in SimpleLogRecordProcessor after shutdown method being called

Open OS-joaocastilho opened this issue 2 years ago • 4 comments

Describe your environment Python version: 3.10

Steps to reproduce

  • Initialize an instance of the SimpleLogRecordProcessor class.
  • Execute an action that call the shutdown method on the processor instance.
  • Emit a log after the shutdown method has been called.

What is the expected behavior? The SimpleLogRecordProcessor should gracefully handle the log emission after the shutdown method has been called and not generate a RecursionError.

What is the actual behavior? The class generates a RecursionError due to the warning logic in the emit method, creating a recursion loop when another log is emitted after the shutdown method is being called.

Additional context Comparing this behavior with the BatchLogRecordProcessor, it handles the scenario differently, it just returns and do not raise any warning, avoiding the RecursionError. The SimpleLogRecordProcessor should adopt a similar approach and not raise warnings to prevent the recursion loop and associated error.

OS-joaocastilho avatar Jan 08 '24 17:01 OS-joaocastilho

Can you share a concrete example that will generate the error?

chittalpatel avatar Mar 06 '24 22:03 chittalpatel

Tried to reproduce without success

Running this

https://gist.github.com/msscaroso/6f9d1c0eb63ab0009d0e63b47b996c5c

Gives

Processor is already shutdown, ignoring call

msscaroso avatar Jun 15 '24 13:06 msscaroso

@OS-joaocastilho Could you share more details on how to reproduce?

msscaroso avatar Jun 15 '24 13:06 msscaroso

@OS-joaocastilho

Gentle ping on this issue. Could you respond to the above ask on how to reproduce this? Perhaps the issue has already been addressed.

lzchen avatar Jul 02 '24 19:07 lzchen

Hey @lzchen and @msscaroso,

This issue occurs in a multi-threading service where the other threads don't realize that the log processor has already shut down.

I created a simpler example where I forced the shutdown of the log processor. This example demonstrates that the service enters a recursion loop. This happens because in SimpleLogProcessor when it tries to emit, it checks if it is already shut down. If it is, it sends a warning message, which eventually causes another issue when it emits the warning message.

Here's a snapshot of a simple code example:

import logging

from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor
from opentelemetry.sdk._logs._internal.export import ConsoleLogExporter
from opentelemetry._logs import set_logger_provider

logger_provider = LoggerProvider()
set_logger_provider(logger_provider)

exporter = ConsoleLogExporter()
log_record_processor = SimpleLogRecordProcessor(exporter)
logger_provider.add_log_record_processor(log_record_processor)

handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

root_logger = logging.getLogger()

root_logger.handlers.clear()

root_logger.addHandler(handler)

root_logger.setLevel("INFO")

logging.info("Test")

log_record_processor.shutdown()

logging.warning("The logger is already shutdown")

These are the package versions:

opentelemetry-exporter-otlp==1.24.0
opentelemetry-exporter-otlp-proto-common==1.24.0
opentelemetry-exporter-otlp-proto-grpc==1.24.0
opentelemetry-exporter-otlp-proto-http==1.24.0
opentelemetry-instrumentation==0.45b0
opentelemetry-instrumentation-asgi==0.45b0
opentelemetry-instrumentation-fastapi==0.45b0
opentelemetry-instrumentation-logging==0.45b0
opentelemetry-proto==1.24.0
opentelemetry-sdk==1.24.0
opentelemetry-semantic-conventions==0.45b0
opentelemetry-util-http==0.45b0

OS-joaocastilho avatar Jul 03 '24 08:07 OS-joaocastilho

I have another way to reproduce the RecursionError. If using Otel Python SDK for OTLP log export by HTTP and the endpoint is unavailable, then a ConnectionError happens and a repeating error stack is returned. I've attached full log here: tmp59.log

To reproduce, manually instrument a script (similar to @OS-joaocastilho 's example) with the HTTP version of OTLPLogExporter pointed at http://does-not-exist and SimpleLogRecordProcessor:

import logging

from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import (
    BatchLogRecordProcessor,
    SimpleLogRecordProcessor
)
from opentelemetry.exporter.otlp.proto.http._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry._logs import set_logger_provider

logger_provider = LoggerProvider()
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(
    endpoint="http://does-not-exist"
)

log_record_processor = SimpleLogRecordProcessor(exporter)
# log_record_processor = BatchLogRecordProcessor(exporter)

logger_provider.add_log_record_processor(log_record_processor)

handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

root_logger = logging.getLogger()
root_logger.addHandler(handler)

root_logger.warning("A root log!!")

Swapping to BatchLogRecordProccessor doesn't give the error, as already mentioned in OP.

The repeating error stack includes this line: _logger.exception("Exception while exporting logs."). I think it's repeated failures to export log records that result in creation of more log records and therefore RecursionError.

A quick fix could be to not log in this case and print instead. I have a PR for that here: https://github.com/open-telemetry/opentelemetry-python/pull/4088

Separate question: Should the HTTP OTLP exporters continue to raise the ConnectionError like this?

tammy-baylis-swi avatar Jul 26 '24 22:07 tammy-baylis-swi

Doesn't this fall under the support category of log telemetry being collected from the api/sdk itself, which is undesired? Our recommendation for users is to not use the root logger with the Otel LogHandler and to use namespaced loggers so log telemetry is only collected on the user's application.

lzchen avatar Jul 29 '24 20:07 lzchen

Our recommendation for users is to not use the root logger with the Otel LogHandler and to use namespaced loggers so log telemetry is only collected on the user's application.

Good point, and if I change the bottom of the example to this then there is no RecursionError:

foo_logger = logging.getLogger("foo_logger")
foo_logger.addHandler(handler)

foo_logger.warning("A foo log!!")

This opentelemetry.io Getting Started example does show use of a logger by __name__.

This opentelemetry.io Zero Code page links to a Python example that uses the root logger because it's a direct call to logging.info without prior getLogger("some name"), like the simple example. I'll put in a separate PR to update the example: https://github.com/open-telemetry/opentelemetry-python/pull/4090

tammy-baylis-swi avatar Jul 29 '24 22:07 tammy-baylis-swi

Good point, and if I change the bottom of the example to this then there is no RecursionError:

Is this pr still needed in that case?

lzchen avatar Jul 30 '24 16:07 lzchen

It would be nice to know if switching to a namespaced logger helps with OP's use case:

This issue occurs in a multi-threading service where the other threads don't realize that the log processor has already shut down.

@OS-joaocastilho does it make a difference in your original service if logging.getLogger(__name__) instead of logging.getLogger()?

Is https://github.com/open-telemetry/opentelemetry-python/pull/4088 still needed in that case?

@lzchen I'm thinking probably not! 🙂

tammy-baylis-swi avatar Jul 30 '24 16:07 tammy-baylis-swi