Michele Dolfi
Michele Dolfi
Welcome and verbose status messages should be printed only when operating in interacting mode, e.g. in the CLI or in Jupyter notebooks.
We want to create a complete example which generated epub documents from the input PDF.
The majority of the Deep Search Experience users system will operate on a single (auto-assigned) project. Many of the CLI function could be simplify, such that no explicit `proj_key` will...
## Why are these changes needed? In this PR we have updates for the `pdf2parquet` transform. - Update to the new Docling version - Leverage the lazy OCR feature -...
With this PR we expose the more options in the CLI, i.e. which PDF backend and which table model. Note: I had to change the enum, because it was serializing...
As part of the enrichment pipeline we want to leverage multi-modal vision models for the analysis of images in documents. For example: - Charts - UML diagrams - and more...
Html documents can easily be converted (or printed) to PDF. The advantage of this process is that the printing process generates proper layout and visualization components as pages, bounding boxes,...
Docx documents can easily be converted (or printed) to PDF. The advantage of this process is that the printing process generates proper layout and visualization components as pages, bounding boxes,...
In this issue we keep track of the Python 3.13 support in Docling and its components. In most cases, we have to wait for stating a complete support, until the...
This new feature creates a new `ImgUnderstand` pipeline which uses vision LLMs to describe the pictures contained in documents. The pipeline allows to use 1. Local LLM, via vLLM 2....