open-parse
open-parse copied to clipboard
ValueError: Coordinate 'right' is less than 'left'
Given this code:
import openparse
basic_doc_path = "mydoc.pdf"
parser = openparse.DocumentParser(
table_args={
"parsing_algorithm": "unitable",
"min_table_confidence": 0.8,
}
)
parsed_basic_doc = parser.parse(basic_doc_path)
for node in parsed_basic_doc.nodes:
print(node.json())
I'm getting the following error:
File "/home/green/git/cl-langtools/test.py", line 11, in <module>
parsed_basic_doc = parser.parse(basic_doc_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/openparse/doc_parser.py", line 106, in parse
table_elems = tables.ingest(doc, table_args_obj, verbose=self._verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/openparse/tables/parse.py", line 223, in ingest
return _ingest_with_unitable(doc, parsing_args, verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/openparse/tables/parse.py", line 189, in _ingest_with_unitable
table_str = table_img_to_html(table_img)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/openparse/tables/unitable/core.py", line 192, in table_img_to_html
pred_cell_lst = predict_cells(image_tensor, pred_bbox, table_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/openparse/tables/unitable/core.py", line 160, in predict_cells
_image_to_tensor(image.crop(bbox), size=(112, 448)) for bbox in pred_bboxes
^^^^^^^^^^^^^^^^
File "/home/green/git/cl-langtools/tools/open-parse/lib64/python3.12/site-packages/PIL/Image.py", line 1237, in crop
raise ValueError(msg)
ValueError: Coordinate 'right' is less than 'left'
If it helps, my input document is this one: https://www.rbc.com/investor-relations/_assets-custom/pdf/ar_2023_e.pdf
Thanks for this great library.
I'm also getting ValueError: Coordinate 'right' is less than 'left' with this PDF
and almost the same code:
import openparse
basic_doc_path = "sample.pdf"
parser = openparse.DocumentParser(
table_args={
"parsing_algorithm": "unitable",
"min_table_confidence": 0.8
},
)
parsed_doc = parser.parse(basic_doc_path)
Any updates on this one? Getting the same error. I'm assuming it means a table is in an unexpected position