rmw_fastrtps icon indicating copy to clipboard operation
rmw_fastrtps copied to clipboard

Backup Discovery Server Seg Faults on Repeated Use

Open a-krawciw opened this issue 1 year ago • 8 comments

Bug report

Required Info:

  • Operating System: Ubuntu 22.04 ROS 2 Humble
  • Installation type: Binaries
  • Version or commit hash: 6.2.6-1jammy.20240517.161150
  • DDS implementation: rmw: rmw_fastrtps_cpp
  • Client library (if applicable): N/A

Steps to reproduce issue

The primary server is running on the robot over the network with id 0.

The first time I ran

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          1                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

everything worked as expected.

The second time I had to use ID 2:

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
Segmentation fault (core dumped)    
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          2                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

The next time I ran the same command

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
Segmentation fault (core dumped)    
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
fast-discovery-server: malloc.c:2617: sysmalloc: Assertion '(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && 
((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)                                                                   
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 3 -p 11811 -b
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          3                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

Expected behavior

It should be possible to run, stop and re-run the discovery terminal with the same id as long as the original server has closed. As I run the server more times, I have to keep increasing the ID to allow it to start. This persists through a reboot. Is there some cache file that is not being deleted properly?

a-krawciw avatar Oct 05 '24 20:10 a-krawciw

Update: If I delete the json and db3 files that are created in my home directory before running the backup server again it functions properly.

There is a bug related to re-loading or restarting the backup server at a later time.

a-krawciw avatar Oct 06 '24 15:10 a-krawciw

The primary server is running on the robot over the network with id 0.

root@tomoyafujita:~/ros2_ws/colcon_ws# fastdds discovery --server-id 0
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811

this already bind the port 11811.

fast-discovery-server -i 1 -p 11811 -b

this should fail with the following error, since the port cannot be allocated.

root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:37:53.593 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:37:53.596 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable

can you provide the complete procedure step by step including primary server setup? i tried to reproduce the issue but it does not happen with rolling source build.

fujitatomoya avatar Oct 07 '24 22:10 fujitatomoya

with humble, i can see the expected error as following, the same with rolling. i would like to see the complete procedure to make this happen.

root@tomoyafujita:~/ros2_ws/humble_ws# fastdds discovery --server-id 0
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811

and then,

root@tomoyafujita:~/ros2_ws/humble_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:46:35.175 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:46:35.178 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable

i tried to make it happen with using other ports and stop/start the discovery server with backup option, but so far i cannot reproduce the issue in my local environment.

fujitatomoya avatar Oct 07 '24 22:10 fujitatomoya

Hi @fujitatomoya thanks for looking into this.

Sorry if the setup wasn't clear. I have two physical machines connected by ethernet. That's how port 11811 will be available for both discovery servers.

My setup Robot IP: 192.168.131.1 fast-discovery-server -i 0 -p 11811

Laptop IP: 192.168.131.10 fast-discovery-server -i 1 -p 11811 -b

The ROS_DISCOVERY_SERVER variable is set to ROS_DISCOVERY_SERVER="192.168.131.1:11811;192.168.131.10:11811" on both machines.

If there are no files related to the backup server on my laptop, the setup works fine. If the files server-*.json and .db files are present, I get the seg fault.

a-krawciw avatar Oct 08 '24 15:10 a-krawciw

with using 2 machines in the same network.

  • rollilng with https://github.com/ros2/ros2/commit/e1dbaf865827927d0c8ffcd9146e71028f650695, i cannot reproduce segfault with discovery server.
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

fujitatomoya avatar Oct 09 '24 01:10 fujitatomoya

even with released humble environment, i cannot reproduce the issue.

@a-krawciw I need more information to reproduce this issue, can you tell me how you can make this happen with step by step and command by command?

root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

fujitatomoya avatar Oct 09 '24 01:10 fujitatomoya

My setup is as follows.

Robot Computer 1: fast-discovery-server -i 0 -p 11811

Laptop Computer 2: fast-discovery-server -i 1 -p 11811 -b Ctrl+C

Robot Computer 1: Ctrl+C fast-discovery-server -i 0 -p 11811

Laptop Computer 2: fast-discovery-server -i 1 -p 11811 -b

This results in the seg fault for me.

a-krawciw avatar Oct 11 '24 13:10 a-krawciw

This problem cannot be observed with latest humble release to me.

  • machine-A
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
  • machine-B
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

Can you try if the problem still happens after apt upgrade? the only difference that i can see here is, ros-humble-rmw-fastrtps-cpp version of mine is up to date.

root@tomoyafujita:~# dpkg -s ros-humble-rmw-fastrtps-cpp
Package: ros-humble-rmw-fastrtps-cpp
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 352
Maintainer: Michel Hidalgo <[email protected]>
Architecture: amd64
Version: 6.2.7-1jammy.20240728.212513
Depends: libc6 (>= 2.32), libgcc-s1 (>= 3.3.1), libstdc++6 (>= 11), ros-humble-fastcdr, ros-humble-fastrtps, ros-humble-ament-cmake, ros-humble-fastrtps-cmake-module, ros-humble-rcpputils, ros-humble-rcutils, ros-humble-rmw, ros-humble-rmw-dds-common, ros-humble-rmw-fastrtps-shared-cpp, ros-humble-rosidl-cmake, ros-humble-rosidl-runtime-c, ros-humble-rosidl-runtime-cpp, ros-humble-rosidl-typesupport-fastrtps-c, ros-humble-rosidl-typesupport-fastrtps-cpp, ros-humble-tracetools, ros-humble-ros-workspace
Description: Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.

fujitatomoya avatar Oct 11 '24 15:10 fujitatomoya