amazon-textract-response-parser
amazon-textract-response-parser copied to clipboard
`add_page_orientation` does not handle pages without words (e.g. blank pages) (Python)
add_page_orientation raises an error on documents with blank pages.
If the input data for statistics.mode is empty, StatisticsError is raised (see Python docs)
This could be fixed with something along the lines of:
word_orientiations = [
round(__get_degree_from_polygon(w.geometry.polygon))
for w in words
if w.geometry and w.geometry.polygon
]
orientation = statistics.mode(word_orientiations) if word_orientiations else 0
Or some other alternative 🤷♂️
https://github.com/aws-samples/amazon-textract-response-parser/blob/541c07a12d603deed70699357f865d6974369c7b/src-python/trp/t_pipeline.py#L136-L150