[BUG] Segmentation Fault during solver.solve in Python run
While running a simple Python script, the solver crashes with a segmentation fault. It happens using the teaserpp_example.py and teaser_python_ply.py too. At installation, all ctest passed successfully. Reinstalling from the develop branch did not solve the problem, and running the script with OMP_NUM_THREADS=12 (or even 1) still produces a segmentation fault at the solver.solve step.
The Python script is run with Python3 on Ubuntu 22.04.5 from a virtual environment with numpy==2.1.1, open3d==0.18.0, teaserpp_python==1.0.0. The machine has 24 cores and 135 GB RAM, so memory overload should not be the issue. An example of command that produces a segmentation fault is (run from the folder
I would appreciate it if you could give me an idea of what could go wrong and how to solve this issue.
Edit: After further debugging, I found that in the example teaser_python_3dsmooth.py, the segmentation fault was triggered at the line 267: frag1.data = frag1_desc.T The problem here might be writing to the o3d.pipelines.registration.Feature()? With Python3.10.12, it happens with open3d0.18.0, open3d0.17.0 and open3d0.16.0. I could not downgrade open3d further. Did someone find a solution for this?
I would suggest try cloning the data and write it to frag1.data
Hi guys, I encountered the same problem. And it looks like the problems occur when executing the function execute_teaser_global_registration, more specifically line 236: teaserpp_solver.solve(source, target). I tried both on my mac and a linux computer and encountered the same problem. I would appreciate @nicolaoe or @jingnanshi can share the solution if the problem is solved!!
@Cuberkk do you mind testing a bit on your side? See whether the same behaviors occur with small number of points or high number of points, etc..
I encounter the same problem. For python 3.6 everything works well, for python 3.9 the seg fault appear.
Hi guys, I played around with building the environment, and I was able to run the examples in the repo in my WSL (Ubuntu 16.04) with a conda environment of Python 3.6. In this environment, the open3d is installed straight through the pip install open3d. I can also run teaser on a Linux 22.04 OS with a conda environment of Python 3.10. But when I use the same procedure on another computer, I still encounter the segmentation problem. So I guess there are some weird dependencies issues, but for Python 3.6, what I do is follow the Reproduce the GIF Above procedure in the git repo while the open3d is installed through the command: pip install open3d. I hope this will help you guys!
@Cuberkk thanks! For the segmentation fault case, can you try importing teaser at the end of all other imports and try again? Thanks!
hey guys, I have done some debuggings.
Following the minimum python example in readme, I encounter the seg fault bug:
sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n teaser_test python=3.10 numpy conda activate teaser_test pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake -DTEASERPP_PYTHON_VERSION=3.10 .. && make teaserpp_python cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py
Then I tried uninstall open3d and simply using python3.10 with teaser-pp, the seg fault bug disappeared.
Hope this hint helps.
Shijie
Hi, I was able to reproduce the problem by
'''
sudo apt install cmake libeigen3-dev libboost-all-dev
conda create -n reg python=3.10 numpy -y
conda activate reg
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../../.. && python testTeaser.py
'''
# File name testTeaser.py
import numpy as np
import teaserpp_python
# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)
solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve(test1, test2)
some testing
I don't think it an open3d problem. I ran some debug(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff539d079 in pybind11::detail::type_caster<Eigen::Matrix<double, 3, -1, 0, 3, -1>, void>::load (this=0x7fffffffb3c0, src=..., convert=true) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/eigen/matrix.h:327
#2 0x00007ffff5391533 in pybind11::detail::argument_loader<teaser::RobustRegistrationSolver*, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&>::load_impl_sequence<0ul, 1ul, 2ul> (this=0x7fffffffb3b0,
call=...) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/cast.h:1469
which looks like there are some issues between numpy array and Eigen matrix. I tried using EigenDRef to wrap it
// original wrapper
.def("solve", py::overload_cast<const Eigen::Matrix<double, 3, Eigen::Dynamic>&,
const Eigen::Matrix<double, 3, Eigen::Dynamic>&>(
&teaser::RobustRegistrationSolver::solve))
// EigenDRef binds functions that take Eigen::Ref parameters
.def("solve_debug", [](teaser::RobustRegistrationSolver &self,
py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd1,
py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd2) {
return self.solve(pcd1, pcd2);
})
with the script
# File name testTeaser.py
import numpy as np
import teaserpp_python
# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)
solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve_debug(test1, test2)
print("[DEBUG] EigenDRef works", end="\n\n")
solver.solve(test1, test2)
❯ python testTeaser.py
Starting scale solver (only selecting inliers if scale estimation has been disabled).
Scale estimation complete.
Max core number: 4
Num vertices: 101
Max Clique of scale estimation inliers:
17 53 76
Using chain graph for GNC rotation.
Starting rotation solver.
GNC rotation estimation noise bound:0.0252838
GNC rotation estimation noise bound squared:0.000639273
GNC-TLS solver terminated due to cost convergence.
Cost diff: 0
Iterations: 8
Rotation estimation complete.
Starting translation solver.
Translation estimation complete.
[DEBUG] EigenDRef works
[1] 3844075 segmentation fault (core dumped) python testTeaser.py
I was able to successfully run teaser_python_ply.py with the debug one. Tested only on python 3.10 and 3.11.
I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.
conda create -n reg python=3.10 numpy=1.26 -y
conda activate reg
pip install open3d
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../.. && cd examples/teaser_python_ply
python teaser_python_ply.py
However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.
Hi, I was able to reproduce the problem by
''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''
File name testTeaser.py
import numpy as np import teaserpp_python
random data
test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)
solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.
conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.
I have modified the version of pybind11 from v2.11.1 to v2.13.1 in the line 8 of "cmake/pybind11.CMakeLists.txt.in", rebuild, reinstall, and finally works
Hi, I was able to reproduce the problem by
''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''
File name testTeaser.py
import numpy as np import teaserpp_python
random data
test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)
solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.
conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.
that solved my problem, thank you so much!
Hi, I was able to reproduce the problem by
''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''
File name testTeaser.py
import numpy as np import teaserpp_python
random data
test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)
solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.
conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.
Directly downgrade the numpy into version 1.26 without rebuilding works for me.