docling
docling copied to clipboard
consolidate advanced chunker notebook
Main improvements with this PR:
- Set chunk.text directly to updated text (including any headings, captions)
- Add typing
- switch to list comprehensions where possible
- encapsulate all methods within new chunker implementation
- use dataclass instead of unmanaged dictionary
- list dependencies in setup installation line