rtabmap icon indicating copy to clipboard operation
rtabmap copied to clipboard

Loop detection freezing with SuperGlue, please fix it. Test on 2022-IlluminationInvariant.

Open cdb0y511 opened this issue 3 years ago • 6 comments

Hi, @matlabbe. It happens with the latest git, How to reproduce: with the latest git, and I test with the https://github.com/introlab/rtabmap/blob/71a28bb570e26f6bbcb78bd7a95ea75c24a4d4d8/archive/2022-IlluminationInvariant/README.md first DB loc_190321-165128.db, source load from DB, use odometer in the DB, and Kp/DetectorStrategy, Vis/FeatureType should be set to 11 (SuperPoint), and Vis/CorNNType is set to 6 (SuperGlue). It freezes in a few detections like below 2022-09-09 17-11-56 的屏幕截图 The odometer seems to continue, but loop detection stops. I can press stop, but can not close the DB, need to force kill it. I return to the latest release(https://github.com/introlab/rtabmap/releases/tag/0.20.16) 2022-09-09 17-12-02 的屏幕截图 Everything is OK, superGlue works. Btw, the superGlue works on the odometer, but freezes with the loop detection. Ubuntu:20.04, test on both Cuda 11.6 and 11.7. libtorch 1.8.2. I think there are some issues with loop detection with the recent commit. I hope you can fix it soon. Thanks,@matlabbe

cdb0y511 avatar Sep 09 '22 09:09 cdb0y511

Btw the superpoint with kdtree works with loop detection but freezes with superGlue. I think you may check recent commits about the loop detection, otherwise can not reproduce the results with the latest git for the paper https://doi.org/10.3389/frobt.2022.801886.

cdb0y511 avatar Sep 09 '22 09:09 cdb0y511

Currently working on a docker image to reproduce the results (working so far but minor issues to fix this week, maybe today if I have time). There is however a known issue with loading python scripts in rtabmap when there is more than one thread using Python at the same time. The results presented in the paper were generated using rtabmap-reprocess tool, which works with single thread, so no Python mutli-threading issue like with standalone ui app.

Related to https://github.com/introlab/rtabmap_ros/issues/534

matlabbe avatar Sep 12 '22 18:09 matlabbe

Updated README with a docker example: https://github.com/introlab/rtabmap/tree/master/archive/2022-IlluminationInvariant#docker

matlabbe avatar Sep 13 '22 07:09 matlabbe

@matlabbe, thanks I will look into it. But I hope this will be fixed for standalone soon. I wonder if it is related to the python version (currently python 3.8, can 3.9 avoid this issue?).

cdb0y511 avatar Sep 14 '22 02:09 cdb0y511

Reproduced the problem on standalone, stuck on:

SuperGlue python init()

It seems freezing when initializing superglue: https://github.com/introlab/rtabmap/blob/adfb250d4ee33641ce7057bdb34c748127259120/corelib/src/python/rtabmap_superglue.py#L25

To reproduce:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia \
    --env="DISPLAY=$DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    --env="XAUTHORITY=$XAUTH" \
    --volume="$XAUTH:$XAUTH" \
    -v ~/Downloads/Illumination_invariant_databases:/workspace/databases \
    rtabmap_frontiers \
        rtabmap --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Kp/DetectorStrategy 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6 

Say "Yes" to all startup dialogs, then Open Preferences->Source, set Source->Database, scroll down and set database path to any databases in "/workspace/databases". Say "Yes" to use odometry data and "Yes" to process all data. Click ok, new database, then start.

The difference between rtabmap-reprocess and rtabmap, is that the map update thread is not running on the main thread in the standalone. This may cause issue when python interpreter is not initialized in same thread context. For reference, those are the two python classes involved: https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PythonInterface.cpp https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PyMatcher.cpp

PythonInterface is initialized on the main thread (constructor here, created here), while rtabmap class is running on a second thread called RtabmapThread. Normally PythonInterface should switch python context between threads, thus it must be a problem when switching contexts.

I see two solutions:

  • Fix python thread switching context (preferred to handle all new coming python-based approaches), or
  • Implement superglue in C++ (similarly to SuperPoint to avoid calling python from c++)

matlabbe avatar Sep 14 '22 16:09 matlabbe

I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.

cdb0y511 avatar Sep 17 '22 02:09 cdb0y511

Tested with 0.20.16 using the docker image (checking out 0.20.16 inside and rebuild it) and the same problem happens. Digging more into the issue, I tried to replicate a minimal example on how python is used inside rtabmap across threads, based on this example:

// runs in a new thread
void f(PyInterpreterState* interp, const char* tname)
{
    std::string code = R"PY(

from __future__ import print_function
import sys

print("TNAME: sys.xxx={}".format(getattr(sys, 'xxx', 'attribute not set')))

    )PY";

    code.replace(code.find("TNAME"), 5, tname);
    
    
    PyThreadState* threadState = PyThreadState_New(interp);
    PyEval_RestoreThread(threadState);
    

    //sub_interpreter::thread_scope scope(interp);
    PyRun_SimpleString(code.c_str());
    
    PyThreadState_Clear(threadState);
    PyThreadState_DeleteCurrent();
}

int main()
{
    initialize init;
    
    PyThreadState* mainState;
    mainState = PyEval_SaveThread();

    PyEval_RestoreThread(mainState);

    PyRun_SimpleString(R"PY(

# set sys.xxx, it will only be reflected in t4, which runs in the context of the main interpreter

from __future__ import print_function
import sys

sys.xxx = ['abc']
print('main: setting sys.xxx={}'.format(sys.xxx))

    )PY");
    
    mainState = PyEval_SaveThread();

    // Simulating here a thread using the main python interpreter
    std::thread t4{f, mainState->interp, "t4(main)"};
    t4.join();
    
    PyEval_RestoreThread(mainState);

    return 0;
}

This works as expected. I then checked where exactly the code is freezing on superglue side, and it seems it happens when it calls load_state_dict here:

self.load_state_dict(torch.load(str(path)))

Maybe related issue: https://github.com/huggingface/transformers/issues/8649

~Note that when you say:~

~I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.~

~do you mean the windows cuda binaries? If so, there could be an issue with the pytorch version used.~ EDIT: The windows binaries don't have python support.

matlabbe avatar Sep 24 '22 22:09 matlabbe

At least on ROS it works. I tested by adding ros noetic in the rtabmap_frontiers docker image.

Launch the docker image:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia     \
    --env="DISPLAY=$DISPLAY"     \
    --env="QT_X11_NO_MITSHM=1"     \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw"     \
    --env="XAUTHORITY=$XAUTH"     \
    --volume="$XAUTH:$XAUTH"     \
    --network host \
    --privileged  \
    rtabmap_frontiers 

Install ros noetic and build rtabmap_ros in the container, then after launching realsense D435i like in this tutorial, from inside the container:

roslaunch rtabmap_ros rtabmap.launch args:="-d  \
        --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Reg/RepeatOnce false \
        --Vis/CorGuessWinSize 0 \
        --Kp/DetectorStrategy 11 \
        --Vis/FeatureType 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6" \
      depth_topic:=/camera/aligned_depth_to_color/image_raw \
      rgb_topic:=/camera/color/image_raw \
      camera_info_topic:=/camera/color/camera_info \
      approx_sync:=false \
      wait_imu_to_init:=true \
      imu_topic:=/rtabmap/imu

To make sure there is no second matching done after superglue, set --Reg/RepeatOnce false --Vis/CorGuessWinSize 0. In this example, both rtabmap and rgbd_odometry nodes are using SuperPoint/SuperGlue. To use only SuperPoint/SuperGlue on rtabmap node, change args by rtabmap_args.

matlabbe avatar Sep 24 '22 23:09 matlabbe

Hello, What's the status of this issue?

mattiasmar avatar May 31 '23 14:05 mattiasmar

  • On ROS (rtabmap and odometry nodes): working
  • Reprocess tool: working
  • Matching tool: working
  • standalone: freezing

matlabbe avatar Jun 04 '23 21:06 matlabbe

I am trying to use Superglue via that dockerfile. I installed ROS Noetic using the standard method and everything works okay until I use catkin_make, which brings an error where it cannot find empy. I have found there is an issue when multiple interpreters are installed, but everything seems to be pointing to the missing dependency. The problem is that the environment uses conda, and it cannot install empy due to a conflict with other packages installed in the image. How did you manage to install ROS and rtabmap_ros?

GVMCOTESA avatar Jun 19 '23 12:06 GVMCOTESA

Step 3 from https://github.com/introlab/rtabmap_ros#build-from-source

matlabbe avatar Jun 20 '23 00:06 matlabbe

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

GVMCOTESA avatar Jun 22 '23 10:06 GVMCOTESA

I'm testing SuperPoint/SuperGlue on freiburg2_pioneer_slam3 dataset. I can see detections, but no matches. @cdb0y511 @matlabbe can you confirm that superglue is expected to work on this dataset?

This is how I test it: ./install/rtabmap/bin/rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt --PyMatcher/Cuda false --SuperPoint/Cuda false /data/TUM/rgbd_dataset_freiburg2_pioneer_slam3 Thanks!

mattiasmar avatar Jun 23 '23 07:06 mattiasmar

Superglue should work for loop closure detection, not odometry (unless you choose F2F odometry). Here a small comparison between different approaches.

  • Default parameters (GFTT features for odom and loop closure, with standard nearest neighbor): rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 1 --Kp/DetectorStrategy 8 --Vis/FeatureType 8 --Reg/RepeatOnce true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3 Screenshot from 2023-06-25 17-54-43

  • Default parameters for odom (GFTT), but using SuperPoint + default NN matching for loop closure: rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3 Screenshot from 2023-06-25 18-12-45

  • Default parameters for odom (GFTT), but using SuperPoint + SuperGlue for loop closure: rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 8 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3 Screenshot from 2023-06-25 17-43-55_odomgftt_loopsuperpoint

  • SuperPoint for odom, Superpoint+Superglue for loop closure detection: rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3 Screenshot from 2023-06-25 17-40-04

For that dataset, it seems there is 3 sec missing while the robot was rotating around 1671st frame. I fixed the code to make Odom/ResetCountdown works with that tool.

Looking at the results, the lack of loop closures for GFTT is more related to binary descriptors, not that it is not Superglue. Here is a difference of matching superpoint features with and without superglue respectively: Screenshot from 2023-06-25 18-06-10 Screenshot from 2023-06-25 18-06-45

matlabbe avatar Jun 26 '23 01:06 matlabbe

@GVMCOTESA what is your base image? Is it the one from nvidia like in the frontiers dockerfile?

matlabbe avatar Jun 26 '23 01:06 matlabbe

I did it with native installed libraries. For docker, you may use frontiers dockerfile. If you want to go ROS, I also recently created an image for rtabmap_ros.

matlabbe avatar Jun 29 '23 06:06 matlabbe

Yes, It is the frontiers one. I will try the new image, thank you.

GVMCOTESA avatar Jun 29 '23 06:06 GVMCOTESA

  • On ROS (rtabmap and odometry nodes): working
  • Reprocess tool: working
  • Matching tool: working
  • standalone: freezing

With the matching tool, is the rtabmap-databaseviewer intended? When I try to induce a loop closure in the DB viewer I get this error whenever the SP/SG is called more than once:

Superglue execution times:  0.8277442455291748 [-0.827739953994751]
[ INFO] (2023-07-06 21:01:10.415) PythonInterface.cpp:48::~PythonInterface() Py_Finalize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.654) DatabaseViewer.cpp:8262::refineConstraint() (1 ->2) Registration time: 1.713461 s
[ INFO] (2023-07-06 21:01:10.680) PythonInterface.cpp:25::PythonInterface() Py_Initialize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:33::PyMatcher() path = /root/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:34::PyMatcher() model = indoor
Segmentation fault (core dumped)

@matlabbe Are you seeing this too?

mattiasmar avatar Jul 06 '23 21:07 mattiasmar

The matching tool is not rtabmap-databaseViewer. For database viewer, it may work just one time, then seg fault the second time (when trying to re-initialize the python classes).

matlabbe avatar Jul 10 '23 21:07 matlabbe

It worked, I was not aware that rtabmap_ros must be in its own container separated from the rest, now I have a separate container for simulating the robot. I have to say, there must be something misconfigured on my part, the map rotates wildly each time a loop closure is detected, and it doesn't seem to be converging. Is there a set of calibration parameters that can help reduce this? image

GVMCOTESA avatar Jul 14 '23 10:07 GVMCOTESA

Can you share the database?

matlabbe avatar Jul 14 '23 16:07 matlabbe

Regarding the app freezing on superglue initialization (https://github.com/introlab/rtabmap/issues/896#issuecomment-1257078993). Here is a gdb log when it happens:

#0  0x00007ffff078f1f1 in PyThreadState_Clear (tstate=0x7fff0c6dcb00) at ../Python/pystate.c:764
#1  0x00007ffefd33794d in pybind11::gil_scoped_acquire::dec_ref() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#2  0x00007ffefd33798d in pybind11::gil_scoped_acquire::~gil_scoped_acquire() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#3  0x00007ffefd6efbcd in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#4  0x00007ffefd6efbed in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#5  0x00007fffd80125cf in torch::autograd::AutogradMeta::~AutogradMeta() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fffeea9da42 in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#7  0x00007fffeea9dbed in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#8  0x00007ffefd704d78 in THPVariable_clear(THPVariable*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#9  0x00007ffefd705125 in THPVariable_subclass_dealloc(_object*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#10 0x00007ffff0878165 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>) at ../Include/object.h:478
#11 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#12 free_keys_object (keys=0x7ffd3f192020) at ../Objects/dictobject.c:584
#13 0x00007ffff0878818 in dictkeys_decref (dk=0x7ffd3f192020) at ../Objects/dictobject.c:324
#14 dict_dealloc (mp=0x7fff459e0340) at ../Objects/dictobject.c:1998
#15 0x00007ffff08743a6 in odict_dealloc (self=0x7fff459e0340) at ../Objects/odictobject.c:1367
#16 0x00007ffff067cd9e in _Py_DECREF (filename=<synthetic pointer>, lineno=4971, op=<optimized out>) at ../Include/object.h:478
#17 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3930, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4971
#18 0x00007ffff0684ef6 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3469
#19 0x00007ffff07d2e4b in _PyEval_EvalCodeWithName
    (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x7fff14ae3b60, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7fff4d8b3730, name=0x7fff14d43530, qualname=0x7fff4d8b2a30) at ../Python/ceval.c:4298
#20 0x00007ffff08b0124 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:436
#21 0x00007ffff08b2417 in _PyObject_FastCallDict (callable=callable@entry=0x7fff4d8b78b0, args=args@entry=0x7fff14ae3b50, nargsf=nargsf@entry=2, kwargs=kwargs@entry=0x0)
    at ../Objects/call.c:96
#22 0x00007ffff08b252d in _PyObject_Call_Prepend (callable=0x7fff4d8b78b0, obj=<optimized out>, args=0x7fff14d164c0, kwargs=0x0) at ../Objects/call.c:888
#23 0x00007ffff084bd47 in slot_tp_init (self=0x7fff14d384f0, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:6790
#24 0x00007ffff08511b9 in type_call (type=<optimized out>, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:994
#25 0x00007ffff08b0b2b in _PyObject_MakeTpCall (callable=0x7ffe5da1e7b0, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at ../Objects/call.c:159
#26 0x00007ffff067cdf3 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffe5da1e7b0) at ../Include/cpython/abstract.h:125
#27 _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:115
#28 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3d58, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#29 0x00007ffff067e46d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3500
#30 0x00007ffff068806b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at ../Objects/call.c:284
#31 0x00007ffff08b0f23 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:147
#32 _PyObject_FastCall (nargs=<optimized out>, args=<optimized out>, func=<optimized out>) at ../Include/cpython/abstract.h:147
#33 _PyObject_CallFunctionVa (callable=0x7fff14b4e8b0, format=<optimized out>, va=va@entry=0x7fff14ae3ec0, is_size_t=is_size_t@entry=0) at ../Objects/call.c:941
#34 0x00007ffff08b218f in _PyObject_CallFunctionVa (is_size_t=0, va=0x7fff14ae3ec0, format=<optimized out>, callable=<optimized out>) at ../Objects/call.c:914
#35 PyObject_CallFunction (callable=<optimized out>, format=<optimized out>) at ../Objects/call.c:961
#36 0x00007ffff73eaf33 in rtabmap::PyMatcher::match(cv::Mat const&, cv::Mat const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, cv::Size_<int> const&) () at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21
#37 0x00007ffff724cc2c in rtabmap::RegistrationVis::computeTransformationImpl(rtabmap::Signature&, rtabmap::Signature&, rtabmap::Transform, rtabmap::RegistrationInfo&) const ()
    at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21

I think the problem is that:

  1. we create the python interpreter in main thread of rtabmap,
  2. the matching call is done inside a sub thread, calling python from c++, thus GIL should be acquire,
  3. then in superglue python code, pytorch's c-functions are called, then pybind11 will release the GIL for c-code and re-acquire it again, triggering some memory clearing that makes the app freezes.

The difference between standalone and ros is that for the later the python interpreter is running in same thread than the one pytorch is running onto. Questions:

  • Should we create one python interpreter per thread to avoid this situation? It looks overkill if rtabmap and odometry are both loading same python modules.
  • Is there a way that the GIL can be acquired/released not from same thread than python interpreter? Based on that example above, it seems so, but we may improve that example with a python code calling another c-function, that would release/acquire the GIL.

matlabbe avatar Sep 10 '23 21:09 matlabbe

Fixed in https://github.com/introlab/rtabmap/commit/f1cd819673804e0c5beaad1f5b316b12c4dcf140

matlabbe avatar Sep 17 '23 08:09 matlabbe

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

I am glad we could skip the docker for now cause I found it has some performance issues related to the docker itself. But unfortunately, you should build libtorch from the source to avoid undefined reference error, related to c++11 abi related to https://github.com/introlab/rtabmap/issues/1063

cdb0y511 avatar Sep 22 '23 06:09 cdb0y511

@cdb0y511 I too note that building pytorch from source avoids the ""undefined reference" errors. However, I also note a severe (>>10x) peformance loss with this pytorch compiled from sources. I'm testing on CPU only and I compile with the flag ENV USE_MKLDNN=1. Prior to that I install Intel's oneDNN like this:

git clone --branch v3.4-pc --recursive https://github.com/oneapi-src/oneDNN.git /one-dnn
mkdir -p build && cd build && cmake .. && make -j  && make install

Question: Did you also recording a loss in inference speed when building pytorch from source? Did you overcome it in some way?

mattiasmar avatar Dec 17 '23 18:12 mattiasmar

sorry to bother everyone, Superglue can run rtabslam without docker ? as follow step is correct?

  1. roslaunch realsense2_camera rs_camera.launch align_depth:=true
  2. roslaunch rtabmap_launch rtabmap.launch args:="-d
    --SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
    --SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
    --Reg/RepeatOnce false
    --Vis/CorGuessWinSize 0
    --Kp/DetectorStrategy 11
    --Vis/FeatureType 11
    --Mem/UseOdomFeatures false
    --Vis/CorNNType 6"
    rtabmap_args:="--delete_db_on_start"
    depth_topic:=/camera/aligned_depth_to_color/image_raw
    rgb_topic:=/camera/color/image_raw
    camera_info_topic:=/camera/color/camera_info
    approx_sync:=false

i get this : QAQ Features2d.cpp:594::create() SupertPoint Torch feature cannot be used as RTAB-Map is not built with the option enabled. GFTT/ORB is used instead.

qetuo105487900 avatar Feb 19 '24 09:02 qetuo105487900

See https://github.com/introlab/rtabmap/issues/1221#issuecomment-1953191870

matlabbe avatar Feb 19 '24 21:02 matlabbe

i ran this as follow : up and down just different with add/ not add --Vis/CorNNType 6

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false
--Vis/CorNNType 6"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

and i get

[ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:63::PyMatcher() Module "demo_superglue" could not be imported! (File="/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py") [ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:64::PyMatcher() Traceback (most recent call last):

File "/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py", line 51, in import torch

File "/home/lun/.local/lib/python3.8/site-packages/torch/init.py", line 237, in from torch._C import * # noqa: F403

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

error

but i ran

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

i get

Parameters.cpp:1149::parseArguments() Parameter migration from "SuperGlue/Path" to "PyMatcher/Path" (value=/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py).

is it correct ?

qetuo105487900 avatar Feb 23 '24 09:02 qetuo105487900

If you don't use --Vis/CorNNType 6, you are not using superglue, but you are still using superpoint with standard KNN matching approach.

So you get this error when using superglue:

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

Has pytorch been built from source? Uninstall the one installed with pip if you rebuilt pytorch from source.

matlabbe avatar Feb 24 '24 22:02 matlabbe