Error building extension 'MultiScaleDeformableAttention' when running sample from web site.
Bug
When running the sample app, I get these errors:
Could not load the custom kernel for multi-scale deformable attention: Error building extension 'MultiScaleDeformableAttention': [1/2] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/TH -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o
FAILED: ms_deform_attn_cuda.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/TH -isystem /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu:19:9: warning: #pragma once in main file
19 | #pragma once
| ^~~~
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu:19:9: warning: #pragma once in main file
19 | #pragma once
| ^~~~
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(261): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_im2col_cuda(cudaStream_t, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(69): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(762): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(872): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(331): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(436): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(544): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_im2col_cuda.cuh(649): warning #177-D: variable "q_col" was declared but never referenced
detected during instantiation of "void ms_deformable_col2im_cuda(cudaStream_t, const scalar_t *, const scalar_t *, const int64_t *, const int64_t *, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, scalar_t *, scalar_t *, scalar_t *) [with scalar_t=double]"
/mnt/programming/CurrentDevelopment/DoclingTest/venv/lib/python3.10/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu(140): here
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
ninja: build stopped: subcommand failed.
...
Steps to reproduce
pip install docling
Then create a script with:
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "### Docling Technical Report[...]"
Running this code produces the error above.
...
Docling version
Docling version: 2.12.0 Docling Core version: 2.10.0 Docling IBM Models version: 3.1.0 Docling Parse version: 3.0.0 ...
Python version
Python 3.10.12
...
Do you get the error when installing Docling, i.e. pip install docling or when running it?
The error you posted looks like a compilation issue which, in case, would happen in the install phase.
It was happening when I ran the sample script:
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
Yeah, I didn't get that either. It seemed to be trying to compile something in C++, if I recall correctly, before running the script.
At some point it ceased doing that. It now gives me this message:
/doclingtest/venv/lib/python3.10/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0
and then it goes on to run, but without using my GPU (RTX 3050.). I have CUDA 12.6 installed and CUDA Toolkit 12.6 as well.
We never encounter (yet) such an error. I'm a bit suspicious about the CUDA and Pytorch versions. I would recommend making sure torch and cuda are compatible, e.g. using the install methods listed on https://pytorch.org/.
I got the same error,but i still can convert it to MarkDown.And this error may disappear after you update gcc version.
/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:13:2: error: #error "You're trying to build PyTorch with a too old version of GCC. We need GCC 9 or later."
I am having the same issue after updating the package today.
We never encounter (yet) such an error. I'm a bit suspicious about the CUDA and Pytorch versions. I would recommend making sure torch and cuda are compatible, e.g. using the install methods listed on https://pytorch.org/.
Uninstalled and reinstalled cuda-toolkit and that seemed to have fixed it. Thanks.
I am having the same issue ,too
Installing latest pytorch didn't fix it right away. But installing updated CUDA for WSL fixed it.
Hello, this problem occurs reliably on a default install of Ubuntu 22.04 + docling from PyPI. Basically it's the usual C++ being incompatible with C++ and CUDA being incompatible with CUDA nonsense. The issue and the solution are documented here: https://github.com/NVIDIA/nccl/issues/650
If reinstalling or updating CUDA doesn't solve it (try that first) then this will:
sudo apt install gcc-10 g++-10
export CC=gcc-10 CXX=g++-10
The CUDA toolkit version installed system wide (and thus the nvcc version) is 11.5.1-1ubuntu1 while what torch pulls in is 12.4.127, which may have something to do with it, but in any case convincing nvcc to use GCC 10 solves the problem.
This should be reopened - it actually results in a failure when running docling on the command-line:
$ docling my-awesome-pdf.pdf
# (endless meaningless C++ nonsense skipped)
Could not load the custom kernel for multi-scale deformable attention: /home/dhdaines/.cache/torch_extensions/py310_cu121/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared
object file: No such file or directory
Agreed this needs to be reopened. I've been trying to use Docling on large files; this seems to be an important issue that's either blocking CUDA altogether or is slowing it down considerably.
This might really be a bug in Transformers however as the source of the problem is the custom kernel in transformers/kernels/deformable_detr: https://github.com/huggingface/transformers/tree/7eecdf2a8650306ed5fbb6150c64f99f587e004d/src/transformers/kernels/deformable_detr/
Hello, this problem occurs reliably on a default install of Ubuntu 22.04 + docling from PyPI. Basically it's the usual C++ being incompatible with C++ and CUDA being incompatible with CUDA nonsense. The issue and the solution are documented here: NVIDIA/nccl#650
If reinstalling or updating CUDA doesn't solve it (try that first) then this will:
sudo apt install gcc-10 g++-10 export CC=gcc-10 CXX=g++-10 The CUDA toolkit version installed system wide (and thus the
nvccversion) is 11.5.1-1ubuntu1 while what torch pulls in is 12.4.127, which may have something to do with it, but in any case convincingnvccto use GCC 10 solves the problem.
fixed for me on WSL just with gcc-10 install & export. CUDA toolkit not reinstall or update.
@dromeuf what was your Ubuntu and CUDA versions? I tried this on a fresh container with latest cuda (from Nvidia's image: Ubuntu 24.04 + CUDA 12.8.0) with no luck.
@dromeuf what was your Ubuntu and CUDA versions? I tried this on a fresh container with latest cuda (from Nvidia's image: Ubuntu 24.04 + CUDA 12.8.0) with no luck.
WSL Ubuntu LTS 22.04.5 + nvcc 11.5 + gcc10
I've since upgraded my ubuntu dist to 24.04 and it works too. Ubuntu 24.04.1 LTS. nvcc release 12.0, V12.0.140
We spent quite some time on this issue in the past few days. It turned out that some Docling dependency is relying on MultiScaleDeformableAttention.so, which needs to be built/compiled first, which happens at latest when trying to extract a document. For this, the CUDA development files were needed.
This issue explains it further.
The following setup worked:
-
python-3.12andpython3.12-dev -
[email protected]and[email protected] - Latest version of Docling
-
nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04(this is probably not suitable production, because the image is pretty large)- Do note that this image doesn't have Python pre-installed, so you need to install it from the
deadsnakePPA
- Do note that this image doesn't have Python pre-installed, so you need to install it from the
It appears this is solved with more recent transformers / torch versions. Please feel free to re-open if you still see the issue.