Docling is unusable on long PDFs with CUDA
Bug
I am running Docling on L4 GPU from modal labs. For whatever reason, it only uses 1.4GB of VRAM and is stuck at 3% GPU utilization regardless of the num_threads I use. In the logs I see this:
INFO:docling.document_converter:Going to convert document batch...
Apr 05 19:51:59.941 | INFO:docling.document_converter:Initializing pipeline for StandardPdfPipeline with options hash e273ea4f9afaa9373468db25fd55d8a1 Apr 05 19:51:59.956 | INFO:docling.models.factories.base_factory:Loading plugin 'docling_defaults' Apr 05 19:51:59.961 | INFO:docling.models.factories:Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract'] Apr 05 19:52:00.196 | INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0' Apr 05 19:52:01.149 | INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0' Apr 05 19:52:01.905 | INFO:docling.models.factories.base_factory:Loading plugin 'docling_defaults' Apr 05 19:52:01.910 | INFO:docling.models.factories:Registered picture descriptions: ['vlm', 'api'] INFO:docling.pipeline.base_pipeline:Processing document wlframework.pdf
...
Steps to reproduce
I am using this code to run docling. I only want to extract figures and tables from the pdf:
extract_images_app = modal.App(name='extract_images-app')
docling_image = modal.Image.from_registry("nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04", add_python="3.10").run_commands(
"apt-get update",
"apt-get install -y software-properties-common",
"pip install PyPDF2",
"pip install docling",
"docling-tools models download",
# force_build=True
)
vol = Volume.from_name("ingestion")
@extract_images_app.function(image=docling_image, volumes={"/ingestion": vol}, gpu="L4", timeout=3600)
def extract_images(input_pdf, extract_images=True, extract_tables=True):
from docling.datamodel.document import ConversionResult
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode, AcceleratorDevice, AcceleratorOptions
from docling_core.types.doc import ImageRefMode, PictureItem, TableItem
import logging
import os
accelerator_options = AcceleratorOptions(
num_threads=256, device=AcceleratorDevice.CUDA
)
# Configure logging
logging.basicConfig(level=logging.INFO)
_log = logging.getLogger(__name__)
# Define the path where models were downloaded during image build
artifacts_path = "/root/.cache/docling/models"
pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
pipeline_options.do_ocr = False
pipeline_options.accelerator_options = accelerator_options
if extract_images or extract_tables:
pipeline_options.images_scale = 2.0
pipeline_options.generate_page_images = True
pipeline_options.generate_picture_images = True
if extract_tables:
pipeline_options.do_table_structure = True
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
doc_converter = DocumentConverter(
format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)}
)
conv_result = doc_converter.convert(input_pdf)
...
Docling version
Version: 2.28.4 ...
Python version
Python 3.10 ...
After 45 minutes, Docling has still not processed my 700 page PDF. For comparison, a 30 page PDF finishes processing in about 30 seconds.
Same here. When we tried to process long PDFs/Excel in the latest version, it's about 3/4x slower than previous versions. We try to move to the new backend, keep v2, etc; and the only way to improve it, it's moving to the Fast Table model.
Have you tried @JamMaster1999?
@jaluma Thanks for that! I can also confirm the issue occurs due to using the accurate table model.
With that said, even with pipeline_options.do_ocr = False and pipeline_options.do_table_structure = False
both turned off and accelerator_options = AcceleratorOptions(num_threads=256, device=AcceleratorDevice.CUDA), I am still getting very little GPU memory utilization. Instead, majority of the memory comes from RAM not VRAM.
Getting a similar Issue. The PDF I used was 240 pages long. It took 12 mins to process the whole thing with CUDA.
I have an RTX3060 Ti -12 GB, the vram utilization stays at 4GB.
Can confirm on my end too. Using docling-serve with RTX 4090 and table_mode = Fast.. GPU Util is around 3%, sometimes peaks at 15%
Took 7 min to process the following 483 page pdf.
https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/DESIGN%20SISTEM%20DAYA%20GERAK/Fundamentals%20of%20Rocket%20Propulsion.pdf
import httpx
async_client = httpx.AsyncClient(timeout=6000.0) url = "http://localhost:6001/v1alpha/convert/source" payload = { "options": { "from_formats": ["pdf","image"], "to_formats": ["md"], "image_export_mode": "placeholder", "do_ocr": True, "pipeline": "standard", "force_ocr": False, "ocr_engine": "easyocr", "ocr_lang": ["en"], "pdf_backend": "dlparse_v2", "table_mode": "fast", "abort_on_error": False, "return_as_file": False }, "file_sources": [], "http_sources": [ {"url": "https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/DESIGN%20SISTEM%20DAYA%20GERAK/Fundamentals%20of%20Rocket%20Propulsion.pdf"} ] }
response = await async_client.post(url, json=payload)
data = response.json()
Sat Apr 26 15:46:29 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:62:00.0 Off | Off | | 30% 33C P2 81W / 450W | 5747MiB / 24564MiB | 9% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2655 C /venv/main/bin/python3.10 5738MiB | +-----------------------------------------------------------------------------------------+
Getting a similar Issue. The PDF I used was 240 pages long. It took 12 mins to process the whole thing with CUDA.
I have an RTX3060 Ti -12 GB, the vram utilization stays at 4GB.
I have encountered the same problem as you. I think it may be caused by serial parsing. Can someone provide an optimization solution?