docling
docling copied to clipboard
Integrate image understanding pipeline
As part of the enrichment pipeline we want to leverage multi-modal vision models for the analysis of images in documents.
For example:
- Charts
- UML diagrams
- and more
An initial prototype is in #25, which will be re-implemented on top of the stronger v2 pipelines.
Runtime
The system will support:
- Prompting a model served as API, e.g. using the openai vision api
- Launching a local model, e.g. using vllm