TEASER-plusplus icon indicating copy to clipboard operation
TEASER-plusplus copied to clipboard

[BUG] Segmentation Fault during solver.solve in Python run

Open nicolaoe opened this issue 1 year ago • 11 comments

While running a simple Python script, the solver crashes with a segmentation fault. It happens using the teaserpp_example.py and teaser_python_ply.py too. At installation, all ctest passed successfully. Reinstalling from the develop branch did not solve the problem, and running the script with OMP_NUM_THREADS=12 (or even 1) still produces a segmentation fault at the solver.solve step.

The Python script is run with Python3 on Ubuntu 22.04.5 from a virtual environment with numpy==2.1.1, open3d==0.18.0, teaserpp_python==1.0.0. The machine has 24 cores and 135 GB RAM, so memory overload should not be the issue. An example of command that produces a segmentation fault is (run from the folder /TEASER-plusplus-develop/python/teaserpp_python/): OMP_NUM_THREADS="12" python3 ./teaserpp_example.py image

I would appreciate it if you could give me an idea of what could go wrong and how to solve this issue.

Edit: After further debugging, I found that in the example teaser_python_3dsmooth.py, the segmentation fault was triggered at the line 267: frag1.data = frag1_desc.T The problem here might be writing to the o3d.pipelines.registration.Feature()? With Python3.10.12, it happens with open3d0.18.0, open3d0.17.0 and open3d0.16.0. I could not downgrade open3d further. Did someone find a solution for this?

nicolaoe avatar Sep 30 '24 09:09 nicolaoe

I would suggest try cloning the data and write it to frag1.data

jingnanshi avatar Oct 10 '24 20:10 jingnanshi

Hi guys, I encountered the same problem. And it looks like the problems occur when executing the function execute_teaser_global_registration, more specifically line 236: teaserpp_solver.solve(source, target). I tried both on my mac and a linux computer and encountered the same problem. I would appreciate @nicolaoe or @jingnanshi can share the solution if the problem is solved!!

Cuberkk avatar Jan 20 '25 20:01 Cuberkk

@Cuberkk do you mind testing a bit on your side? See whether the same behaviors occur with small number of points or high number of points, etc..

jingnanshi avatar Jan 20 '25 22:01 jingnanshi

I encounter the same problem. For python 3.6 everything works well, for python 3.9 the seg fault appear.

eleboss avatar Jan 27 '25 05:01 eleboss

Hi guys, I played around with building the environment, and I was able to run the examples in the repo in my WSL (Ubuntu 16.04) with a conda environment of Python 3.6. In this environment, the open3d is installed straight through the pip install open3d. I can also run teaser on a Linux 22.04 OS with a conda environment of Python 3.10. But when I use the same procedure on another computer, I still encounter the segmentation problem. So I guess there are some weird dependencies issues, but for Python 3.6, what I do is follow the Reproduce the GIF Above procedure in the git repo while the open3d is installed through the command: pip install open3d. I hope this will help you guys!

Cuberkk avatar Jan 27 '25 16:01 Cuberkk

@Cuberkk thanks! For the segmentation fault case, can you try importing teaser at the end of all other imports and try again? Thanks!

jingnanshi avatar Jan 27 '25 20:01 jingnanshi

hey guys, I have done some debuggings.

Following the minimum python example in readme, I encounter the seg fault bug:

sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n teaser_test python=3.10 numpy conda activate teaser_test pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake -DTEASERPP_PYTHON_VERSION=3.10 .. && make teaserpp_python cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py

Then I tried uninstall open3d and simply using python3.10 with teaser-pp, the seg fault bug disappeared.

Hope this hint helps.

Shijie

eleboss avatar Jan 29 '25 17:01 eleboss

Hi, I was able to reproduce the problem by

'''
sudo apt install cmake libeigen3-dev libboost-all-dev
conda create -n reg python=3.10 numpy -y
conda activate reg
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../../.. && python testTeaser.py
'''
# File name testTeaser.py
import numpy as np
import teaserpp_python

# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve(test1, test2)

some testing I don't think it an open3d problem. I ran some debug
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff539d079 in pybind11::detail::type_caster<Eigen::Matrix<double, 3, -1, 0, 3, -1>, void>::load (this=0x7fffffffb3c0, src=..., convert=true) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/eigen/matrix.h:327
#2  0x00007ffff5391533 in pybind11::detail::argument_loader<teaser::RobustRegistrationSolver*, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&>::load_impl_sequence<0ul, 1ul, 2ul> (this=0x7fffffffb3b0, 
    call=...) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/cast.h:1469

which looks like there are some issues between numpy array and Eigen matrix. I tried using EigenDRef to wrap it

// original wrapper
.def("solve", py::overload_cast<const Eigen::Matrix<double, 3, Eigen::Dynamic>&,
                                const Eigen::Matrix<double, 3, Eigen::Dynamic>&>(
                  &teaser::RobustRegistrationSolver::solve))
// EigenDRef binds functions that take Eigen::Ref parameters
.def("solve_debug", [](teaser::RobustRegistrationSolver &self,
                       py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd1,
                       py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd2) {
                      return self.solve(pcd1, pcd2);
                      })

with the script

# File name testTeaser.py
import numpy as np
import teaserpp_python

# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve_debug(test1, test2) 
print("[DEBUG] EigenDRef works", end="\n\n")

solver.solve(test1, test2)
❯ python testTeaser.py
Starting scale solver (only selecting inliers if scale estimation has been disabled).
Scale estimation complete.
Max core number: 4
Num vertices: 101
Max Clique of scale estimation inliers: 
17 53 76 
Using chain graph for GNC rotation.
Starting rotation solver.
GNC rotation estimation noise bound:0.0252838
GNC rotation estimation noise bound squared:0.000639273
GNC-TLS solver terminated due to cost convergence.
Cost diff: 0
Iterations: 8
Rotation estimation complete.
Starting translation solver.
Translation estimation complete.
[DEBUG] EigenDRef works

[1]    3844075 segmentation fault (core dumped)  python testTeaser.py

I was able to successfully run teaser_python_ply.py with the debug one. Tested only on python 3.10 and 3.11.


I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y
conda activate reg
pip install open3d
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../.. && cd examples/teaser_python_ply 
python teaser_python_ply.py

However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

doggydoggy0101 avatar Feb 03 '25 06:02 doggydoggy0101

Hi, I was able to reproduce the problem by

''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''

File name testTeaser.py

import numpy as np import teaserpp_python

random data

test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

I have modified the version of pybind11 from v2.11.1 to v2.13.1 in the line 8 of "cmake/pybind11.CMakeLists.txt.in", rebuild, reinstall, and finally works

zhaoys87 avatar Feb 21 '25 06:02 zhaoys87

Hi, I was able to reproduce the problem by

''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''

File name testTeaser.py

import numpy as np import teaserpp_python

random data

test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

that solved my problem, thank you so much!

amberhappy avatar Apr 14 '25 06:04 amberhappy

Hi, I was able to reproduce the problem by

''' sudo apt install cmake libeigen3-dev libboost-all-dev conda create -n reg python=3.10 numpy -y conda activate reg git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../../.. && python testTeaser.py '''

File name testTeaser.py

import numpy as np import teaserpp_python

random data

test1 = np.random.rand(3, 100) test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params() solver = teaserpp_python.RobustRegistrationSolver(solver_params) solver.solve(test1, test2) some testing I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y conda activate reg pip install open3d git clone https://github.com/MIT-SPARK/TEASER-plusplus.git cd TEASER-plusplus && mkdir build && cd build cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24 cd python && pip install . cd ../.. && cd examples/teaser_python_ply python teaser_python_ply.py However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

Directly downgrade the numpy into version 1.26 without rebuilding works for me.

OuYaozhong avatar Jun 05 '25 08:06 OuYaozhong