docling Error building extension 'MultiScaleDeformableAttention' when running sample from web site.

Bug

When running the sample app, I get these errors:

Could not load the custom kernel for multi-scale deformable attention: Error building extension 'MultiScaleDeformableAttention': [1/2] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/TH -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o 
FAILED: ms_deform_attn_cuda.cuda.o 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/TH -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu:19:9: warning: #pragma once in main file
   19 | #pragma once
      |         ^~~~
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu:19:9: warning: #pragma once in main file
   19 | #pragma once
      |         ^~~~
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(261): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_im2col_cuda(cudaStream_t, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(69): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(762): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(872): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(331): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(436): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(544): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(649): warning #177-D: variable "q_col" was declared but never referenced
          detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]" 
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
ninja: build stopped: subcommand failed.

...

Steps to reproduce

pip install docling

Then create a script with:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "### Docling Technical Report[...]"

Running this code produces the error above.

...

Docling version

Docling version: 2.12.0 Docling Core version: 2.10.0 Docling IBM Models version: 3.1.0 Docling Parse version: 3.0.0 ...

Python version

Python 3.10.12

...

Dec 16 '24 05:12 pdavis68

Do you get the error when installing Docling, i.e. pip install docling or when running it?

The error you posted looks like a compilation issue which, in case, would happen in the install phase.

Dec 16 '24 07:12 dolfim-ibm

It was happening when I ran the sample script:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

Yeah, I didn't get that either. It seemed to be trying to compile something in C++, if I recall correctly, before running the script.

At some point it ceased doing that. It now gives me this message:

/doclingtest/venv/lib/python3.10/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0

and then it goes on to run, but without using my GPU (RTX 3050.). I have CUDA 12.6 installed and CUDA Toolkit 12.6 as well.

Dec 17 '24 04:12 pdavis68

We never encounter (yet) such an error. I'm a bit suspicious about the CUDA and Pytorch versions. I would recommend making sure torch and cuda are compatible, e.g. using the install methods listed on https://pytorch.org/.

Dec 17 '24 07:12 dolfim-ibm

I got the same error,but i still can convert it to MarkDown.And this error may disappear after you update gcc version.

/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:13:2: error: #error "You're trying to build PyTorch with a too old version of GCC. We need GCC 9 or later."

Dec 17 '24 11:12 Phoenix8215

I am having the same issue after updating the package today.

Dec 17 '24 14:12 dre5ib

We never encounter (yet) such an error. I'm a bit suspicious about the CUDA and Pytorch versions. I would recommend making sure torch and cuda are compatible, e.g. using the install methods listed on https://pytorch.org/.

Uninstalled and reinstalled cuda-toolkit and that seemed to have fixed it. Thanks.

Dec 17 '24 22:12 pdavis68

I am having the same issue ,too

Dec 25 '24 07:12 chaoStart

Installing latest pytorch didn't fix it right away. But installing updated CUDA for WSL fixed it.

Jan 14 '25 20:01 vincentqcg

Hello, this problem occurs reliably on a default install of Ubuntu 22.04 + docling from PyPI. Basically it's the usual C++ being incompatible with C++ and CUDA being incompatible with CUDA nonsense. The issue and the solution are documented here: https://github.com/NVIDIA/nccl/issues/650

If reinstalling or updating CUDA doesn't solve it (try that first) then this will:

sudo apt install gcc-10 g++-10
export CC=gcc-10 CXX=g++-10

The CUDA toolkit version installed system wide (and thus the nvcc version) is 11.5.1-1ubuntu1 while what torch pulls in is 12.4.127, which may have something to do with it, but in any case convincing nvcc to use GCC 10 solves the problem.

Jan 16 '25 20:01 dhdaines

This should be reopened - it actually results in a failure when running docling on the command-line:

$ docling my-awesome-pdf.pdf
# (endless meaningless C++ nonsense skipped)
Could not load the custom kernel for multi-scale deformable attention: /home/dhdaines/.cache/torch_extensions/py310_cu121/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared
 object file: No such file or directory

Jan 24 '25 14:01 dhdaines

Agreed this needs to be reopened. I've been trying to use Docling on large files; this seems to be an important issue that's either blocking CUDA altogether or is slowing it down considerably.

Feb 02 '25 20:02 fidecastro

This might really be a bug in Transformers however as the source of the problem is the custom kernel in transformers/kernels/deformable_detr: https://github.com/huggingface/transformers/tree/7eecdf2a8650306ed5fbb6150c64f99f587e004d/src/transformers/kernels/deformable_detr/

Feb 03 '25 13:02 dhdaines

Hello, this problem occurs reliably on a default install of Ubuntu 22.04 + docling from PyPI. Basically it's the usual C++ being incompatible with C++ and CUDA being incompatible with CUDA nonsense. The issue and the solution are documented here: NVIDIA/nccl#650

If reinstalling or updating CUDA doesn't solve it (try that first) then this will:

sudo apt install gcc-10 g++-10 export CC=gcc-10 CXX=g++-10 The CUDA toolkit version installed system wide (and thus the nvcc version) is 11.5.1-1ubuntu1 while what torch pulls in is 12.4.127, which may have something to do with it, but in any case convincing nvcc to use GCC 10 solves the problem.

fixed for me on WSL just with gcc-10 install & export. CUDA toolkit not reinstall or update.

Feb 08 '25 16:02 dromeuf

@dromeuf what was your Ubuntu and CUDA versions? I tried this on a fresh container with latest cuda (from Nvidia's image: Ubuntu 24.04 + CUDA 12.8.0) with no luck.

Feb 08 '25 18:02 fidecastro

@dromeuf what was your Ubuntu and CUDA versions? I tried this on a fresh container with latest cuda (from Nvidia's image: Ubuntu 24.04 + CUDA 12.8.0) with no luck.

WSL Ubuntu LTS 22.04.5 + nvcc 11.5 + gcc10

I've since upgraded my ubuntu dist to 24.04 and it works too. Ubuntu 24.04.1 LTS. nvcc release 12.0, V12.0.140

Feb 09 '25 15:02 dromeuf

We spent quite some time on this issue in the past few days. It turned out that some Docling dependency is relying on MultiScaleDeformableAttention.so, which needs to be built/compiled first, which happens at latest when trying to extract a document. For this, the CUDA development files were needed.

This issue explains it further.

The following setup worked:

python-3.12 and python3.12-dev
[email protected] and [email protected]
Latest version of Docling
nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 (this is probably not suitable production, because the image is pretty large)
- Do note that this image doesn't have Python pre-installed, so you need to install it from the deadsnake PPA

Apr 16 '25 07:04 kdaniel21

It appears this is solved with more recent transformers / torch versions. Please feel free to re-open if you still see the issue.

May 21 '25 14:05 cau-git