Backup Discovery Server Seg Faults on Repeated Use
Bug report
Required Info:
- Operating System: Ubuntu 22.04 ROS 2 Humble
- Installation type: Binaries
- Version or commit hash: 6.2.6-1jammy.20240517.161150
- DDS implementation: rmw: rmw_fastrtps_cpp
- Client library (if applicable): N/A
Steps to reproduce issue
The primary server is running on the robot over the network with id 0.
The first time I ran
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
everything worked as expected.
The second time I had to use ID 2:
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b
Segmentation fault (core dumped)
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 2
Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
The next time I ran the same command
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b
Segmentation fault (core dumped)
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
fast-discovery-server: malloc.c:2617: sysmalloc: Assertion '(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) &&
((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 3 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 3
Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
Expected behavior
It should be possible to run, stop and re-run the discovery terminal with the same id as long as the original server has closed. As I run the server more times, I have to keep increasing the ID to allow it to start. This persists through a reboot. Is there some cache file that is not being deleted properly?
Update: If I delete the json and db3 files that are created in my home directory before running the backup server again it functions properly.
There is a bug related to re-loading or restarting the backup server at a later time.
The primary server is running on the robot over the network with id 0.
root@tomoyafujita:~/ros2_ws/colcon_ws# fastdds discovery --server-id 0
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
this already bind the port 11811.
fast-discovery-server -i 1 -p 11811 -b
this should fail with the following error, since the port cannot be allocated.
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:37:53.593 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:37:53.596 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable
can you provide the complete procedure step by step including primary server setup?
i tried to reproduce the issue but it does not happen with rolling source build.
with humble, i can see the expected error as following, the same with rolling. i would like to see the complete procedure to make this happen.
root@tomoyafujita:~/ros2_ws/humble_ws# fastdds discovery --server-id 0
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
and then,
root@tomoyafujita:~/ros2_ws/humble_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:46:35.175 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:46:35.178 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable
i tried to make it happen with using other ports and stop/start the discovery server with backup option, but so far i cannot reproduce the issue in my local environment.
Hi @fujitatomoya thanks for looking into this.
Sorry if the setup wasn't clear. I have two physical machines connected by ethernet. That's how port 11811 will be available for both discovery servers.
My setup
Robot IP: 192.168.131.1
fast-discovery-server -i 0 -p 11811
Laptop IP: 192.168.131.10
fast-discovery-server -i 1 -p 11811 -b
The ROS_DISCOVERY_SERVER variable is set to ROS_DISCOVERY_SERVER="192.168.131.1:11811;192.168.131.10:11811" on both machines.
If there are no files related to the backup server on my laptop, the setup works fine. If the files server-*.json and .db files are present, I get the seg fault.
with using 2 machines in the same network.
-
rollilngwith https://github.com/ros2/ros2/commit/e1dbaf865827927d0c8ffcd9146e71028f650695, i cannot reproduce segfault with discovery server.
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 2
Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 2
Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
even with released humble environment, i cannot reproduce the issue.
@a-krawciw I need more information to reproduce this issue, can you tell me how you can make this happen with step by step and command by command?
root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 2
Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 2
Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
My setup is as follows.
Robot Computer 1:
fast-discovery-server -i 0 -p 11811
Laptop Computer 2:
fast-discovery-server -i 1 -p 11811 -b
Ctrl+C
Robot Computer 1:
Ctrl+C
fast-discovery-server -i 0 -p 11811
Laptop Computer 2:
fast-discovery-server -i 1 -p 11811 -b
This results in the seg fault for me.
This problem cannot be observed with latest humble release to me.
- machine-A
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
Participant Type: SERVER
Security: NO
Server ID: 0
Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
- machine-B
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
Participant Type: BACKUP
Security: NO
Server ID: 1
Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
Server Addresses: UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
Can you try if the problem still happens after apt upgrade? the only difference that i can see here is, ros-humble-rmw-fastrtps-cpp version of mine is up to date.
root@tomoyafujita:~# dpkg -s ros-humble-rmw-fastrtps-cpp
Package: ros-humble-rmw-fastrtps-cpp
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 352
Maintainer: Michel Hidalgo <[email protected]>
Architecture: amd64
Version: 6.2.7-1jammy.20240728.212513
Depends: libc6 (>= 2.32), libgcc-s1 (>= 3.3.1), libstdc++6 (>= 11), ros-humble-fastcdr, ros-humble-fastrtps, ros-humble-ament-cmake, ros-humble-fastrtps-cmake-module, ros-humble-rcpputils, ros-humble-rcutils, ros-humble-rmw, ros-humble-rmw-dds-common, ros-humble-rmw-fastrtps-shared-cpp, ros-humble-rosidl-cmake, ros-humble-rosidl-runtime-c, ros-humble-rosidl-runtime-cpp, ros-humble-rosidl-typesupport-fastrtps-c, ros-humble-rosidl-typesupport-fastrtps-cpp, ros-humble-tracetools, ros-humble-ros-workspace
Description: Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.