unstructured
unstructured copied to clipboard
bug/`partition_xlsx` function raises TypeError with `infer_table_structure = False` and `find_subtable = False`
Describe the bug
When calling partition_xlsx(file_path, infer_table_structure=False, find_subtable=False), the following error occurs:
TypeError: object of type 'NoneType' has no len()
Stack trace:
Traceback (most recent call last):
File "/Users/lb/js-apps/xlsx-parser/processor.py", line 87, in <module>
processor.process(file_path)
File "/Users/lb/js-apps/xlsx-parser/processor.py", line 27, in process
file = partition_xlsx(file_path, languages=['en'], infer_table_structure=False, find_subtable=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/documents/elements.py", line 605, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 731, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 687, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/unstructured/partition/xlsx.py", line 118, in partition_xlsx
text = soupparser_fromstring(html_text).text_content()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/lxml/html/soupparser.py", line 33, in fromstring
return _parse(data, beautifulsoup, makeelement, **bsargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/lxml/html/soupparser.py", line 78, in _parse
tree = beautifulsoup(source, **bsargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lb/js-apps/xlsx-parser/.venv/lib/python3.12/site-packages/bs4/__init__.py", line 315, in __init__
elif len(markup) <= 256 and (
^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()
To Reproduce
partition_xlsx(file_path, infer_table_structure=False, find_subtable=False)
Expected behavior No erroor
Screenshots N/A
Environment Info N/A
Additional context N/A