docling experimental: introduce img understand pipeline

This new feature creates a new ImgUnderstand pipeline which uses vision LLMs to describe the pictures contained in documents.

The pipeline allows to use

Checklist:

[x] Commit Message Formatting: Commit titles and messages follow guidelines in the conventional commits.
[ ] Documentation has been updated, if necessary.
[ ] Examples have been added, if necessary.
[ ] Tests have been added, if necessary.

Sep 22 '24 18:09 dolfim-ibm

Offline LLM

Pros:

Cons:

no support for mac (any architecture)
vLLM has an exact pinning of torch, which creates issues with poetry.
- vllm==0.5.x depends on torch==2.3.0
- vllm==0.6.x depends on torch==2.4.0.

Pros:

Cons:

more code needed
different models require different implementations, e.g. llava-next is different than phi-3-v.

Sep 22 '24 18:09 dolfim-ibm

@dolfim-ibm could we not use some standard HF models (eg florence and onechart)?

Sep 24 '24 04:09 PeterStaar-IBM

superseded by #259

Nov 06 '24 10:11 dolfim-ibm