amazon-textract-response-parser icon indicating copy to clipboard operation
amazon-textract-response-parser copied to clipboard

`add_page_orientation` does not handle pages without words (e.g. blank pages) (Python)

Open MattExact opened this issue 2 years ago • 0 comments

add_page_orientation raises an error on documents with blank pages.

If the input data for statistics.mode is empty, StatisticsError is raised (see Python docs)

This could be fixed with something along the lines of:

word_orientiations = [
    round(__get_degree_from_polygon(w.geometry.polygon))
    for w in words
    if w.geometry and w.geometry.polygon
]
orientation = statistics.mode(word_orientiations) if word_orientiations else 0

Or some other alternative 🤷‍♂️

https://github.com/aws-samples/amazon-textract-response-parser/blob/541c07a12d603deed70699357f865d6974369c7b/src-python/trp/t_pipeline.py#L136-L150

MattExact avatar Jul 25 '23 15:07 MattExact