unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Adding color to the Text's metadata feat/text-color

Open LDelPinoNT opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. I have a branded manual which section title's are in a specific color. I would like to chunk the PDF into section using color information.

Describe the solution you'd like A clear and concise description of what you want to happen. When using partition_pdf with "fast" strategy, the color of the text is stored in the metadata. (And the documentation reflects it).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. I already tried to use the "by_title" chunking system but some text.category are wrong or the section is chunked to be 500 chars aprox despite to set the max_partition to None.

Additional context Add any other context or screenshots about the feature request here.

Using unstructured from docker image.

LDelPinoNT avatar Jul 16 '24 15:07 LDelPinoNT