docling Docling is unusable on long PDFs with CUDA

Bug

I am running Docling on L4 GPU from modal labs. For whatever reason, it only uses 1.4GB of VRAM and is stuck at 3% GPU utilization regardless of the num_threads I use. In the logs I see this:

INFO:docling.document_converter:Going to convert document batch...

Apr 05 19:51:59.941 | INFO:docling.document_converter:Initializing pipeline for StandardPdfPipeline with options hash e273ea4f9afaa9373468db25fd55d8a1 Apr 05 19:51:59.956 | INFO:docling.models.factories.base_factory:Loading plugin 'docling_defaults' Apr 05 19:51:59.961 | INFO:docling.models.factories:Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract'] Apr 05 19:52:00.196 | INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0' Apr 05 19:52:01.149 | INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0' Apr 05 19:52:01.905 | INFO:docling.models.factories.base_factory:Loading plugin 'docling_defaults' Apr 05 19:52:01.910 | INFO:docling.models.factories:Registered picture descriptions: ['vlm', 'api'] INFO:docling.pipeline.base_pipeline:Processing document wlframework.pdf

...

Steps to reproduce

I am using this code to run docling. I only want to extract figures and tables from the pdf:

extract_images_app = modal.App(name='extract_images-app')
docling_image = modal.Image.from_registry("nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04", add_python="3.10").run_commands(
    "apt-get update",
    "apt-get install -y software-properties-common",
    "pip install PyPDF2",
    "pip install docling",
    "docling-tools models download",
    # force_build=True
)

vol = Volume.from_name("ingestion")

@extract_images_app.function(image=docling_image, volumes={"/ingestion": vol}, gpu="L4", timeout=3600)
def extract_images(input_pdf, extract_images=True, extract_tables=True):
    from docling.datamodel.document import ConversionResult
    from docling.document_converter import DocumentConverter, PdfFormatOption
    from docling.datamodel.base_models import InputFormat
    from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode, AcceleratorDevice, AcceleratorOptions
    from docling_core.types.doc import ImageRefMode, PictureItem, TableItem
    import logging
    import os
    
    accelerator_options = AcceleratorOptions(
        num_threads=256, device=AcceleratorDevice.CUDA
    )
    
    # Configure logging
    logging.basicConfig(level=logging.INFO)
    _log = logging.getLogger(__name__)
    
    # Define the path where models were downloaded during image build
    artifacts_path = "/root/.cache/docling/models"

    pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
    pipeline_options.do_ocr = False
    pipeline_options.accelerator_options = accelerator_options
    if extract_images or extract_tables:
        pipeline_options.images_scale = 2.0
        pipeline_options.generate_page_images = True
        pipeline_options.generate_picture_images = True
    
    if extract_tables:
        pipeline_options.do_table_structure = True
        pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
    
    doc_converter = DocumentConverter(
        format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)}
    )
    
    conv_result = doc_converter.convert(input_pdf)

...

Docling version

Version: 2.28.4 ...

Python version

Python 3.10 ...

Apr 06 '25 02:04 JamMaster1999

After 45 minutes, Docling has still not processed my 700 page PDF. For comparison, a 30 page PDF finishes processing in about 30 seconds.

Apr 06 '25 04:04 JamMaster1999

Same here. When we tried to process long PDFs/Excel in the latest version, it's about 3/4x slower than previous versions. We try to move to the new backend, keep v2, etc; and the only way to improve it, it's moving to the Fast Table model.

Have you tried @JamMaster1999?

Apr 06 '25 10:04 jaluma

@jaluma Thanks for that! I can also confirm the issue occurs due to using the accurate table model. With that said, even with pipeline_options.do_ocr = False and pipeline_options.do_table_structure = False both turned off and accelerator_options = AcceleratorOptions(num_threads=256, device=AcceleratorDevice.CUDA), I am still getting very little GPU memory utilization. Instead, majority of the memory comes from RAM not VRAM.

Apr 06 '25 17:04 JamMaster1999

Getting a similar Issue. The PDF I used was 240 pages long. It took 12 mins to process the whole thing with CUDA.

I have an RTX3060 Ti -12 GB, the vram utilization stays at 4GB.

Apr 13 '25 19:04 Saksham-K

Can confirm on my end too. Using docling-serve with RTX 4090 and table_mode = Fast.. GPU Util is around 3%, sometimes peaks at 15%

Took 7 min to process the following 483 page pdf.

https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/DESIGN%20SISTEM%20DAYA%20GERAK/Fundamentals%20of%20Rocket%20Propulsion.pdf

import httpx

async_client = httpx.AsyncClient(timeout=6000.0) url = "http://localhost:6001/v1alpha/convert/source" payload = { "options": { "from_formats": ["pdf","image"], "to_formats": ["md"], "image_export_mode": "placeholder", "do_ocr": True, "pipeline": "standard", "force_ocr": False, "ocr_engine": "easyocr", "ocr_lang": ["en"], "pdf_backend": "dlparse_v2", "table_mode": "fast", "abort_on_error": False, "return_as_file": False }, "file_sources": [], "http_sources": [ {"url": "https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/DESIGN%20SISTEM%20DAYA%20GERAK/Fundamentals%20of%20Rocket%20Propulsion.pdf"} ] }

response = await async_client.post(url, json=payload)

data = response.json()

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2655 C /venv/main/bin/python3.10 5738MiB | +-----------------------------------------------------------------------------------------+

Apr 26 '25 15:04 joc32

Getting a similar Issue. The PDF I used was 240 pages long. It took 12 mins to process the whole thing with CUDA.

I have an RTX3060 Ti -12 GB, the vram utilization stays at 4GB.

I have encountered the same problem as you. I think it may be caused by serial parsing. Can someone provide an optimization solution?

May 23 '25 10:05 feriki