amazon-textract-textractor Use module name for logger instead of Root Logger

Typically, it's best practice for Python logging to use logging.getLogger(__name__).

However, the ResponseParser simply does import logging and then logging.info(...) - this results in the root logger being used, as if the logger was logging.getLogger("root").

i.e. https://github.com/aws-samples/amazon-textract-textractor/blob/9df5d268dead3f42104cde2f766cb16be3f93d95/textractor/parsers/response_parser.py#L148

The logs created by the ResponseParser are many and spam our Server Logs. As a result, the only way to filter these logs is to apply a Logging Filter

class TextractFilter(logging.Filter):
    """
    Since Textract uses the root logger, we cannot set the logger level
    without affecting other usage of the root logger.
    With this filter, we are able to filter INFO logs from Textract.
    """

    def filter(self, record: logging.LogRecord):
        return not (record.module == "response_parser" and record.levelno == logging.INFO)


def configure_loggers():
    # This is for Textract to not spam the Server with INFO logs
    logging.getLogger("root").addFilter(TextractFilter())

Screen Shot 2024-05-20 at 2 01 31 PM

The Textract Filter works - but is generally not best practice when all I want to do is something like

logging.getLogger("textractor").setLevel(logging.WARNING)

Is there a better approach, or could we change the logger to use the module __name__ to be better configurable? thank you :)

May 20 '24 21:05 michaelshum321

UP I need this same fix! I'm having exactly the same problem as described by @michaelshum321.

I make the same suggestion:

Is there a better approach, or could we change the logger to use the module name to be better configurable? thank you :)

Is it possible to implement it? Thanks :)

Jul 03 '24 10:07 JorgeMSL

+1

Oct 15 '24 15:10 DSLituiev

We will pick this up for the upcoming 1.9.0.

Oct 15 '24 17:10 Belval

I don't know if this is related but I have in my codebase these lines in order to setup my own application logging config:

from src.config.loader import CONFIG
from src.utils.logger import get_logger
logger = get_logger(CONFIG.logging.name, CONFIG.logging.level)

then in the debug console I can test it with: logger.error("test")

I noticed that if I debug as mentioned, I get the desired result, i.e. the sysout is formatted with my formatting. On the other hand, if I try instead:

from textractor.parsers import response_parser
from src.config.loader import CONFIG
from src.utils.logger import get_logger
logger = get_logger(CONFIG.logging.name, CONFIG.logging.level)

I get duplicated outputs, one formatted as desired, one formatted from something that comes from textractor I guess:

How can I fix this?

Nov 12 '24 13:11 LuchiLucs

+1 to this!

When I encountered this, I thought I could disable these logs with:

import logging
logging.getLogger("textractor").setLevel(logging.WARNING)

But could not due to this issue.

Nov 23 '24 04:11 arjunnayak

@Belval any update on when this would be released?

Dec 05 '24 11:12 tarujg

This was added in v1.9.0 by #419.

Mar 07 '25 21:03 Belval