unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/`partition_xlsx` function raises TypeError with `infer_table_structure = False` and `find_subtable = False`

Open bawgz opened this issue 1 year ago • 0 comments

Describe the bug When calling partition_xlsx(file_path, infer_table_structure=False, find_subtable=False), the following error occurs: TypeError: object of type 'NoneType' has no len()

Stack trace:

Traceback (most recent call last):
  File "/Users/lb/js-apps/xlsx-parser/processor.py", line 87, in <module>
    processor.process(file_path)
  File "/Users/lb/js-apps/xlsx-parser/processor.py", line 27, in process
    file = partition_xlsx(file_path, languages=['en'], infer_table_structure=False, find_subtable=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/documents/elements.py", line 605, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 731, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 687, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/partition/xlsx.py", line 118, in partition_xlsx
    text = soupparser_fromstring(html_text).text_content()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/lxml/html/soupparser.py", line 33, in fromstring
    return _parse(data, beautifulsoup, makeelement, **bsargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/lxml/html/soupparser.py", line 78, in _parse
    tree = beautifulsoup(source, **bsargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/bs4/__init__.py", line 315, in __init__
    elif len(markup) <= 256 and (
         ^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

To Reproduce partition_xlsx(file_path, infer_table_structure=False, find_subtable=False)

Expected behavior No erroor

Screenshots N/A

Environment Info N/A

Additional context N/A

bawgz avatar Sep 25 '24 01:09 bawgz