docling icon indicating copy to clipboard operation
docling copied to clipboard

Integrate image understanding pipeline

Open dolfim-ibm opened this issue 1 year ago • 0 comments

As part of the enrichment pipeline we want to leverage multi-modal vision models for the analysis of images in documents.

For example:

  • Charts
  • UML diagrams
  • and more

An initial prototype is in #25, which will be re-implemented on top of the stronger v2 pipelines.

Runtime

The system will support:

  1. Prompting a model served as API, e.g. using the openai vision api
  2. Launching a local model, e.g. using vllm

dolfim-ibm avatar Nov 01 '24 12:11 dolfim-ibm