unstructured
unstructured copied to clipboard
feat: ability to skip non-plain-text element types in chunk_by_title()
Is your feature request related to a problem? Please describe.
chunk_by_title is a great way to combine related text elements. however, the caller may not want to combine all element types, e.g. Table and Figure, with other element types when forming the CompositeElements.
Describe the solution you'd like
Add
skip_element_types=['Table', 'Figure', <... and any other "non-plain text" elements>] to chunk_by_title, and also make this parameter accessible from partition_ functions and unstructured-ingest.