rmw_fastrtps icon indicating copy to clipboard operation
rmw_fastrtps copied to clipboard

TypeSupport::deserializeROSmessage(..) crashes deserialising msg published by rmw_microxrcedds

Open gavanderhoorn opened this issue 3 years ago • 11 comments

Bug report

tl;dr: rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage(..) appears to attempt to resize(..) the MultiDOFJointTrajectoryPoint::transforms field to an unexpectedly large value, suggesting either serialisation errors on the rmw_microxrcedds side, or state/memory corruption or alignment issues on the rmw_fastrtps side.

'Evidence' gathered from Wireshark captures suggests serialisation errors are unlikely, as hex dumps/dissections of (XRCE-)DDS traffic appears to be as-it-should-be.

Sometimes, the crash occurs after a few messages. Other times, it takes 50 minutes or more before the crash happens.


Original: let me start by saying I was unsure as to where to post this. Either this issue tracker (ie: rmw_fastrtps) or the rmw_microxrcedds one. In the end, as I only observe the crashes in rmw_fastrtps, I decided to post here.

I realise the fact rmw_microxrcedds is involved means the problem could also be on that side, but my debugging thus far appears to suggest that's not the case (see discussion below).

Required Info:

  • Operating System:
    • Ubuntu Jammy (but also observed on Focal)
  • Installation type:
    • apt (ie: binary)
  • Version or commit hash:
    • "latest" packages for Humble (and Galactic) as of 2022-11-03 (sorry for being so vague)
  • DDS implementation:
    • FastDDS on action client side, rmw_microxrcedds on action server side
  • Client library (if applicable):
    • so far only observed with rclcpp (client side)

Steps to reproduce issue

I'd prefer sharing an MRE as well, but don't have one at the moment. In addition, the crashes have been hard to reproduce.

The best description I can provide:

  1. create a micro-ROS FollowJointTrajectory action server
  2. create a regular ROS 2 FollowJointTrajectory action client (registering feedback callback is not required)
  3. periodically submit new goals to server from client, waiting for their completion

Expected behavior

Client can submit an "infinite" number of goals.

Server processes those goals while publishing feedback to the client.

Actual behavior

Client crashes after a "random time", completely shutting down the node containing the action client:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

The server does not appear to be affected. After restart of the client, everything continues as normal.

Additional information

I have several backtraces of the crashes (some with debug symbols). I've included them at the end of this post.

Some observations from these traces:

  • all contain references to rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage(..) and eventually cdr_deserialize(..) et al.
  • all appear to crash at the same point: rosidl_typesupport_fastrtps_cpp/trajectory_msgs/msg/detail/dds_fastrtps/multi_dof_joint_trajectory_point__type_support.cpp:179. This is generated code. I've included the code around that line below, after the GDB traces
  • all appear to attempt to resize(..) the MultiDOFJointTrajectoryPoint::transforms field to a "ridiculously" large value (1157637866 for instance in the third crash, frame #14)

Some more information about the server: the action server does not populate any of the multi_dof_* fields (these). It does initialise them to either [], "" or 0. The transforms field/list is completely empty, as are all the others.

I have Wireshark captures of all of the messages exchanged between the server and the client, and -- for the feedback messages I've checked -- none of those appear to contain any "garbage" or unexpected data.

I've used jseldent/wireshark-dds-xrce to dissect the data coming from the server, combined with gavanderhoorn/micro_xrce_dds_ros_idl_dissectors to dissect the actual ROS 2 message data.

As it starts to look like the RMW deserialiser is somehow deserialising corrupt/incorrect data, I tried to look for the values for __new_size mentioned in the backtraces in the Wireshark captures as binary data. I noticed the following:

# __new_size hex l-endian found where?
1 1157681529 0x79D10045 Y FollowJointTrajectory_Feedback message, request_id + object_id fields
2 1157646959 0x6F4A0045 Y FollowJointTrajectory_Feedback message, request_id + object_id fields
3 1157637866 0xEA260045 Y FollowJointTrajectory_Feedback message, request_id + object_id fields
4 1668008540 0x5CCA6B63 Y multiple messages, multiple fields

All of the values could be found in the network traffic, but in 'unexpected' places. Sometimes at the start of a payload, sometimes even before the actual ROS message payload, sometimes at a 'random' location in the serialised ROS message payload.

The request_id + object_id fields mentioned in the where? column are micro-ROS specific (actually: XRCE-DDS specific) packet fields, which always occur before the serialised ROS message payload.

The last value appears to be a Unix Epoch and converts to Wednesday, November 9, 2022 4:42:20 PM GMT+01:00. This actually corresponds to the day the Wireshark capture was recorded.

I'm hoping this is sufficient information to at least trigger some people here and start a discussion. As the crashes are not easily reproduced, debugging has been difficult. Individually, all steps investigated (ie: network traffic, (de)serialisation, etc) appear to work correctly.

Perhaps accumulated internal state of the RMW causes a desync/memory corruption somewhere?


GDB backtraces:

Crash 1
Thread 1 "my_node" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737339898176) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737339898176)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737339898176)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737339898176, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff7642476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff76287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7aa2bbe in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7aae24c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7aae2b7 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7aae518 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7aa27cc in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff664b6ae in __gnu_cxx::new_allocator<geometry_msgs::msg::Transform_<std::allocator<void> > >::allocate (this=0x555555985880, __n=<optimized out>)
    at /usr/include/c++/11/ext/new_allocator.h:103
#11 std::allocator_traits<std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::allocate (__n=<optimized out>, __a=...)
    at /usr/include/c++/11/bits/alloc_traits.h:464
#12 std::_Vector_base<geometry_msgs::msg::Transform_<std::allocator<void> >
::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/bits/stl_vector.h:346
#13 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_default_append (__n=1157681529, this=0x555555985880)
    at /usr/include/c++/11/bits/vector.tcc:635
#14 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::resize (
    __new_size=1157681529, this=0x555555985880)
    at /usr/include/c++/11/bits/stl_vector.h:940
#15 trajectory_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., 
    ros_message=...)
    at ./.obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/trajectory_msgs/msg/detail/dds_fastrtps/multi_dof_joint_trajectory_point__type_support.cpp:179
#16 0x00007ffff11d5697 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:1153
#17 0x00007ffff11d6181 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:2739
#18 0x00007ffff7245959 in rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage
    (this=0x55555588d3c0, deser=..., ros_message=0x555555985640, 
    impl=0x7ffff11e3200 <control_msgs::action::typesupport_fastrtps_cpp::_FollowJointTrajectory_FeedbackMessage__callbacks>)
    at ./src/type_support_common.cpp:110
#19 0x00007ffff71f3ab6 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#20 0x00007ffff6e2ab4a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff6aee29b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff71ef8e9 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_s const*, void*, bool*, rmw_message_info_s*, rmw_subscription_allocation_s*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#23 0x00007ffff78fd3ba in rcl_take (subscription=0x555555a12578, 
    ros_message=ros_message@entry=0x555555985640, 
    message_info=message_info@entry=0x7fffffffbe40, 
    allocation=allocation@entry=0x0) at ./src/rcl/subscription.c:516
#24 0x00007ffff7922e55 in rcl_action_take_feedback (
    action_client=0x555555a36510, ros_feedback=0x555555985640)
    at ./src/rcl_action/action_client.c:438
#25 0x00007ffff794a1db in rclcpp_action::ClientBase::take_data (

    this=0x555555a36530) at ./src/client.cpp:551
#26 0x00007ffff7dc787f in rclcpp::Executor::get_next_ready_executable_from_map
    (this=0x7fffffffd180, any_executable=..., 
    weak_groups_to_nodes=std::map with 5 elements = {...})
    at ./src/rclcpp/executor.cpp:864
#27 0x00007ffff7dcf632 in rclcpp::executors::MultiThreadedExecutor::run (
    this=0x7fffffffd180, this_thread_number=<optimized out>)
    at ./src/rclcpp/executors/multi_threaded_executor.cpp:85
#28 0x00007ffff7dcfa95 in rclcpp::executors::MultiThreadedExecutor::spin (
    this=0x7fffffffd180)
    at ./src/rclcpp/executors/multi_threaded_executor.cpp:62
#29 0x000055555555883a in main ()
Crash 2
[Switching to Thread 0x7fffd2ffd640 (LWP 831209)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736733369920) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736733369920)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736733369920)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736733369920, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff7642476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff76287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7aa2bbe in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7aae24c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7aae2b7 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7aae518 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7aa27cc in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff664b6ae in __gnu_cxx::new_allocator<geometry_msgs::msg::Transform_<std::allocator<void> > >::allocate (this=0x7fffc40015c0, __n=<optimized out>)
    at /usr/include/c++/11/ext/new_allocator.h:103
#11 std::allocator_traits<std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::allocate (__n=<optimized out>, __a=...)
    at /usr/include/c++/11/bits/alloc_traits.h:464
#12 std::_Vector_base<geometry_msgs::msg::Transform_<std::allocator<void> >,
::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/bits/stl_vector.h:346
#13 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_default_append (__n=1157646959, this=0x7fffc40015c0)
    at /usr/include/c++/11/bits/vector.tcc:635
#14 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::resize (
    __new_size=1157646959, this=0x7fffc40015c0)
    at /usr/include/c++/11/bits/stl_vector.h:940
#15 trajectory_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., 
    ros_message=...)
    at ./.obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/trajectory_msgs/msg/detail/dds_fastrtps/multi_dof_joint_trajectory_point__type_support.cpp:179
#16 0x00007ffff11d5697 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:1153
#17 0x00007ffff11d6181 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:2739
#18 0x00007ffff7245959 in rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage
    (this=0x5555559bc0d0, deser=..., ros_message=0x7fffc4001380, 
    impl=0x7ffff11e3200 <control_msgs::action::typesupport_fastrtps_cpp::_FollowJointTrajectory_FeedbackMessage__callbacks>)
    at ./src/type_support_common.cpp:110
#19 0x00007ffff71f3ab6 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#20 0x00007ffff6e2ab4a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff6aee29b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff71ef8e9 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_s const*, void*, bool*, rmw_message_info_s*, rmw_subscription_allocation_s*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#23 0x00007ffff78fd3ba in rcl_take (subscription=0x555555a11528, 
    ros_message=ros_message@entry=0x7fffc4001380, 
    message_info=message_info@entry=0x7fffd2ffaa20, 
    allocation=allocation@entry=0x0) at ./src/rcl/subscription.c:516
#24 0x00007ffff7922e55 in rcl_action_take_feedback (
    action_client=0x5555559e78e0, ros_feedback=0x7fffc4001380)
    at ./src/rcl_action/action_client.c:438
#25 0x00007ffff794a1db in rclcpp_action::ClientBase::take_data (
    this=0x5555559e7900) at ./src/client.cpp:551
#26 0x00007ffff7dc787f in rclcpp::Executor::get_next_ready_executable_from_map
    (this=0x7fffffffd180, any_executable=..., 
    weak_groups_to_nodes=std::map with 5 elements = {...})
    at ./src/rclcpp/executor.cpp:864
#27 0x00007ffff7dcf632 in rclcpp::executors::MultiThreadedExecutor::run (
    this=0x7fffffffd180, this_thread_number=<optimized out>)
    at ./src/rclcpp/executors/multi_threaded_executor.cpp:85
#28 0x00007ffff7adc2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x00007ffff7694b43 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#30 0x00007ffff7726a00 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
Crash 3
Thread 14 "my_node" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd3fff640 (LWP 855301)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736750155328) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736750155328)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736750155328)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736750155328, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff7642476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff76287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7aa2bbe in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7aae24c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7aae2b7 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7aae518 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7aa27cc in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff664b6ae in __gnu_cxx::new_allocator<geometry_msgs::msg::Transform_<std::allocator<void> > >::allocate (this=0x7fffcc0018c0, __n=<optimized out>)
    at /usr/include/c++/11/ext/new_allocator.h:103
#11 std::allocator_traits<std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::allocate (__n=<optimized out>, __a=...)
    at /usr/include/c++/11/bits/alloc_traits.h:464
#12 std::_Vector_base<geometry_msgs::msg::Transform_<std::allocator<void> >, 
::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/bits/stl_vector.h:346
#13 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_default_append (__n=1157637866, this=0x7fffcc0018c0)
    at /usr/include/c++/11/bits/vector.tcc:635
#14 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::resize (
    __new_size=1157637866, this=0x7fffcc0018c0)
    at /usr/include/c++/11/bits/stl_vector.h:940
#15 trajectory_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., 
    ros_message=...)
    at ./.obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/trajectory_msgs/msg/detail/dds_fastrtps/multi_dof_joint_trajectory_point__type_support.cpp:179
#16 0x00007ffff11d5697 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:1153
#17 0x00007ffff11d6181 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:2739
#18 0x00007ffff7245959 in rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage
    (this=0x5555559040a0, deser=..., ros_message=0x7fffcc001680, 
    impl=0x7ffff11e3200 <control_msgs::action::typesupport_fastrtps_cpp::_FollowJointTrajectory_FeedbackMessage__callbacks>)
    at ./src/type_support_common.cpp:110
#19 0x00007ffff71f3ab6 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#20 0x00007ffff6e2ab4a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff6aee29b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff71ef8e9 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_s const*, void*, bool*, rmw_message_info_s*, rmw_subscription_allocation_s*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#23 0x00007ffff78fd3ba in rcl_take (subscription=0x5555558d6a88, 
    ros_message=ros_message@entry=0x7fffcc001680, 
    message_info=message_info@entry=0x7fffd3ffca20, 
    allocation=allocation@entry=0x0) at ./src/rcl/subscription.c:516
#24 0x00007ffff7922e55 in rcl_action_take_feedback (
    action_client=0x555555992fc0, ros_feedback=0x7fffcc001680)
    at ./src/rcl_action/action_client.c:438
#25 0x00007ffff794a1db in rclcpp_action::ClientBase::take_data (
    this=0x555555992fe0) at ./src/client.cpp:551
#26 0x00007ffff7dc787f in rclcpp::Executor::get_next_ready_executable_from_map
    (this=0x7fffffffd180, any_executable=..., 
    weak_groups_to_nodes=std::map with 5 elements = {...})
    at ./src/rclcpp/executor.cpp:864
#27 0x00007ffff7dcf632 in rclcpp::executors::MultiThreadedExecutor::run (
    this=0x7fffffffd180, this_thread_number=<optimized out>)
    at ./src/rclcpp/executors/multi_threaded_executo
Crash 4
Thread 15 "my_node" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffa3ffe640 (LWP 858961)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140735944844864) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140735944844864)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140735944844864)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140735944844864, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff7642476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff76287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7aa2bbe in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7aae24c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7aae2b7 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7aae518 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7aa27cc in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff664b6ae in __gnu_cxx::new_allocator<geometry_msgs::msg::Transform_<std::allocator<void> > >::allocate (this=0x7fff680018f0, __n=<optimized out>)
    at /usr/include/c++/11/ext/new_allocator.h:103
#11 std::allocator_traits<std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::allocate (__n=<optimized out>, __a=...)
    at /usr/include/c++/11/bits/alloc_traits.h:464
#12 std::_Vector_base<geometry_msgs::msg::Transform_<std::allocator<void> >, 
::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/bits/stl_vector.h:346
#13 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::_M_default_append (__n=1668008540, this=0x7fff680018f0)
    at /usr/include/c++/11/bits/vector.tcc:635
#14 std::vector<geometry_msgs::msg::Transform_<std::allocator<void> >, std::allocator<geometry_msgs::msg::Transform_<std::allocator<void> > > >::resize (
    __new_size=1668008540, this=0x7fff680018f0)
    at /usr/include/c++/11/bits/stl_vector.h:940
#15 trajectory_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., 
    ros_message=...)
    at ./.obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/trajectory_msgs/msg/detail/dds_fastrtps/multi_dof_joint_trajectory_point__type_support.cpp:179
#16 0x00007fffd21d6697 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:1153
#17 0x00007fffd21d7181 in control_msgs::action::typesupport_fastrtps_cpp::cdr_deserialize (cdr=..., ros_message=...)
    at ./obj-x86_64-linux-gnu/rosidl_typesupport_fastrtps_cpp/control_msgs/action/detail/dds_fastrtps/follow_joint_trajectory__type_support.cpp:2739
#18 0x00007ffff7245959 in rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage
    (this=0x55555592f5b0, deser=..., ros_message=0x7fff680016b0, 
    impl=0x7fffd21e4200 <control_msgs::action::typesupport_fastrtps_cpp::_FollowJointTrajectory_FeedbackMessage__callbacks>)
    at ./src/type_support_common.cpp:110
#19 0x00007ffff71f3ab6 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#20 0x00007ffff6e2ab4a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff6aee29b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff71ef8e9 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_s const*, void*, bool*, rmw_message_info_s*, rmw_subscription_allocation_s*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#23 0x00007ffff78fd3ba in rcl_take (subscription=0x555555a03ad8, 
    ros_message=ros_message@entry=0x7fff680016b0, 
    message_info=message_info@entry=0x7fffa3ffba20, 
    allocation=allocation@entry=0x0) at ./src/rcl/subscription.c:516
#24 0x00007ffff7922e55 in rcl_action_take_feedback (
    action_client=0x555555a327f0, ros_feedback=0x7fff680016b0)
    at ./src/rcl_action/action_client.c:438
#25 0x00007ffff794a1db in rclcpp_action::ClientBase::take_data (
    this=0x555555a32810) at ./src/client.cpp:551
#26 0x00007ffff7dc787f in rclcpp::Executor::get_next_ready_executable_from_map
    (this=0x7fffffffd020, any_executable=..., 
    weak_groups_to_nodes=std::map with 5 elements = {...})
    at ./src/rclcpp/executor.cpp:864
#27 0x00007ffff7dcf632 in rclcpp::executors::MultiThreadedExecutor::run (
    this=0x7fffffffd020, this_thread_number=<optimized out>)
    at ./src/rclcpp/executors/multi_threaded_executo

Crashing code

Click to expand
bool
ROSIDL_TYPESUPPORT_FASTRTPS_CPP_PUBLIC_trajectory_msgs
cdr_deserialize(
  eprosima::fastcdr::Cdr & cdr,
  trajectory_msgs::msg::MultiDOFJointTrajectoryPoint & ros_message)
{
  // Member: transforms
  {
    uint32_t cdrSize;
    cdr >> cdrSize;
    size_t size = static_cast<size_t>(cdrSize);
    ros_message.transforms.resize(size);                               // <<---- line 179
    for (size_t i = 0; i < size; i++) {
      geometry_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize(
        cdr, ros_message.transforms[i]);
    }
  }

  // Member: velocities
  {
    uint32_t cdrSize;
    cdr >> cdrSize;
    size_t size = static_cast<size_t>(cdrSize);
    ros_message.velocities.resize(size);
    for (size_t i = 0; i < size; i++) {
      geometry_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize(
        cdr, ros_message.velocities[i]);
    }
  }

  // Member: accelerations
  {
    uint32_t cdrSize;
    cdr >> cdrSize;
    size_t size = static_cast<size_t>(cdrSize);
    ros_message.accelerations.resize(size);
    for (size_t i = 0; i < size; i++) {
      geometry_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize(
        cdr, ros_message.accelerations[i]);
    }
  }

  // Member: time_from_start
  builtin_interfaces::msg::typesupport_fastrtps_cpp::cdr_deserialize(
    cdr, ros_message.time_from_start);

  return true;
}

gavanderhoorn avatar Nov 14 '22 15:11 gavanderhoorn

And just to show it: example hex dump of a datagram containing the serialised version of a control_msgs/action/FollowJointTrajectory::Feedback message. Below it is the dissected version Wireshark shows.

Note: this is just the XRCE-DDS WRITE_DATA serialized payload data, which should be what RMW FastRTPS also "sees" IIUC. It does not include the XRCE-DDS header(s).

All fields related to multi_dof are either 0, [empty] or "" as expected.

Hex dump (click to expand)
0000   c2 92 da 52 e1 d3 7d 27 5b 5b 49 52 44 3d d2 b0   ...R..}'[[IRD=..
0010   72 c5 6b 63 ff 66 55 36 01 00 00 00 00 00 00 00   r.kc.fU6........
0020   06 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 31 5f   ........joint_1_
0030   73 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 32 5f   s.......joint_2_
0040   6c 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 33 5f   l.......joint_3_
0050   75 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 34 5f   u.......joint_4_
0060   72 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 35 5f   r.......joint_5_
0070   62 00 50 30 0a 00 00 00 6a 6f 69 6e 74 5f 36 5f   b.P0....joint_6_
0080   74 00 31 2f 06 00 00 00 af 58 8e 07 59 57 f2 bf   t.1/.....X..YW..
0090   a2 00 f3 48 d9 e8 c6 3f be c7 b5 7c fc 8e dd bf   ...H...?...|....
00a0   00 8a 03 b8 49 7b 9e 3f d6 54 a7 5a 51 3e ec bf   ....I{.?.T.ZQ>..
00b0   20 c1 a0 16 ba b2 05 40 00 00 00 00 00 00 00 00    ......@........
00c0   00 00 00 00 00 00 00 00 00 00 00 00 06 00 00 00   ................
00d0   a1 e0 0f b2 f9 5d f2 bf 39 06 1e ea e5 80 c6 3f   .....]..9......?
00e0   05 ce 63 9e e5 87 dd bf 6e 2d c6 99 4c 3b a3 3f   ..c.....n-..L;.?
00f0   0c 6b 69 c2 3c c0 eb bf 39 e1 45 15 1e 7b 05 40   .ki.<...9.E..{.@
0100   06 00 00 00 72 6f 73 32 5e be ee 61 a5 a8 93 3f   ....ros2^..a...?
0110   b8 cd 99 1c f1 35 a3 3f ac 83 78 f8 a4 04 79 3f   .....5.?..x...y?
0120   0b 9c bc 86 41 05 b9 3f 4d ba b4 43 7b 4b c8 3f   ....A..?M..C{K.?
0130   fb 31 8a 3d 19 6b d5 3f 00 00 00 00 06 00 00 00   .1.=.k.?........
0140   73 bd 6d a6 42 3c d2 3f 37 36 3b 52 7d e7 d3 bf   s.m.B<.?76;R}...
0150   4f 3c 67 0b 08 ad e1 3f b3 ea 73 b5 15 fb c3 bf   O<g....?..s.....
0160   e9 96 1d e2 1f b6 b8 bf 93 ff c9 df bd a3 be 3f   ...............?
0170   00 00 00 00 00 00 00 00 06 00 00 00 74 63 70 5f   ............tcp_
0180   00 c8 1f 06 aa 82 5a 3f 40 9a 3e b5 d7 fc 69 3f   ......Z?@.>...i?
0190   00 e4 e6 47 79 5b 3c bf 70 43 23 ee 3d ed 7f bf   ...Gy[<.pC#.=...
01a0   80 72 7a 0f 26 85 8f bf 80 f3 6f ad 00 ce 9b 3f   .rz.&.....o....?
01b0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
01c0   00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00   ................  <-- multi_dof data
01d0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................    |
01e0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................    |
01f0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................    |
0200   00 00 00 00]                                      ....                _
Wireshark dissection (click to expand)
data
    serialized_data (control_msgs/action/FollowJointTrajectory_Feedback)
        header:
            stamp:
                sec: 1668007282
                nanosec: 911566591
            frame_id: ""
        joint_names:
            [0] "joint_1_s"
            [1] "joint_2_l"
            [2] "joint_3_u"
            [3] "joint_4_r"
            [4] "joint_5_b"
            [5] "joint_6_t"
        desired:
            positions:
                [0] -1,1463251395376
                [1] 0,17898098051046
                [2] -0,46185218981475
                [3] 0,029767181259453
                [4] -0,88260715204037
                [5] 2,712268997937
            velocities: [empty]
            accelerations: [empty]
            effort: [empty]
            time_from_start:
                sec: 0
                nanosec: 0
        actual:
            positions:
                [0] -1,1479432063625
                [1] 0,17580865794666
                [2] -0,46141949146042
                [3] 0,03756179215551
                [4] -0,86721647234967
                [5] 2,6851159727049
            velocities:
                [0] 0,019198021032649
                [1] 0,037520918599422
                [2] 0,0061079448856
                [3] 0,097736449618069
                [4] 0,18980351263004
                [5] 0,33466177949666
            accelerations: [empty]
            effort:
                [0] 0,284928
                [1] -0,311004
                [2] 0,552372
                [3] -0,1561
                [4] -0,096529
                [5] 0,119686
            time_from_start:
                sec: 0
                nanosec: 0
        error:
            positions:
                [0] 0,0016180668249004
                [1] 0,0031723225638019
                [2] -0,00043269835432685
                [3] -0,0077946108960568
                [4] -0,015390679690705
                [5] 0,027153025232099
            velocities: [empty]
            accelerations: [empty]
            effort: [empty]
            time_from_start:
                sec: 0
                nanosec: 0
        multi_dof_joint_names: [empty]
        multi_dof_desired:
            transforms: [empty]
            velocities: [empty]
            accelerations: [empty]
            time_from_start:
                sec: 0
                nanosec: 0
        multi_dof_actual:
            transforms: [empty]
            velocities: [empty]
            accelerations: [empty]
            time_from_start:
                sec: 0
                nanosec: 0
        multi_dof_error:
            transforms: [empty]
            velocities: [empty]
            accelerations: [empty]
            time_from_start:
                sec: 0
                nanosec: 0

gavanderhoorn avatar Nov 14 '22 15:11 gavanderhoorn

Some questions:

  • Could you show the same as in the previous message (hex dump and dissection) of the message that produces the error?
  • What are the sizes of the serialized message, your XRCE-DDS buffers (MTU), and the number of XRCE buffers?
  • Which transport are you using at the micro-ROS level?
  • Are you using a multithreaded micro-ROS app?

pablogs9 avatar Nov 15 '22 07:11 pablogs9

  • Could you show the same as in the previous message (hex dump and dissection) of the message that produces the error?

I can't be sure it's the exact message that causes the crash. I do know it's the only FollowJointTrajectory::Feedback message in the third Wireshark capture which contains __new_size in hex (ie: 0xEA260045).

As I write later (in response to your question): the complete msg consists of two fragments.

Hex dumps

These are the complete XRCE-DDS protocol payloads, so not just the serialized_data I showed in https://github.com/ros2/rmw_fastrtps/issues/644#issuecomment-1313954347.

First fragment (appears first in capture, 550 bytes on-the-wire, 508 bytes XRCE-DDS protocol data):

Click to expand
0000   81 80 a0 ff 0d 01 f4 01 07 01 08 02 ea 26 00 45   .............&.E
0010   c2 92 da 52 e1 d3 7d 27 5b 5b 49 52 44 3d d2 b0   ...R..}'[[IRD=..
0020   71 c5 6b 63 ad b2 5c 1f 01 00 00 00 00 00 00 00   q.kc..\.........
0030   06 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 31 5f   ........joint_1_
0040   73 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 32 5f   s.......joint_2_
0050   6c 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 33 5f   l.......joint_3_
0060   75 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 34 5f   u.......joint_4_
0070   72 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 35 5f   r.......joint_5_
0080   62 00 00 00 0a 00 00 00 6a 6f 69 6e 74 5f 36 5f   b.......joint_6_
0090   74 00 ca bf 06 00 00 00 2e 7b e8 35 24 33 f3 bf   t........{.5$3..
00a0   04 08 4d 4d 91 8f b1 3f 12 20 f9 5a be 9d dc bf   ..MM...?. .Z....
00b0   25 95 42 9b 99 f1 d3 3f dd b0 16 5d f9 14 d5 bf   %.B....?...]....
00c0   d2 69 50 e7 3d c7 fb 3f 00 00 00 00 00 00 00 00   .iP.=..?........
00d0   00 00 00 00 00 00 00 00 00 00 00 00 06 00 00 00   ................
00e0   41 71 99 0f 79 5e f3 bf 5a 9e 55 4e 77 fa a7 3f   Aq..y^..Z.UNw..?
00f0   94 f4 3a 53 2c 6e dc bf 76 86 75 98 08 80 d7 3f   ..:S,n..v.u....?
0100   35 40 ed cd 15 34 cc bf cb 1c 54 39 ef b2 f8 3f   [email protected]...?
0110   06 00 00 00 02 9d 49 9b f5 6d 6a ba ce 92 a8 3f   ......I..mj....?
0120   48 e6 14 dd 28 cc b8 3f ac 83 78 f8 a4 04 89 3f   H...(..?..x....?
0130   50 64 c2 1f aa 15 d0 3f 88 7c 52 d6 f9 86 df 3f   Pd.....?.|R....?
0140   10 11 dc 4f 89 c5 eb 3f 00 00 00 00 06 00 00 00   ...O...?........
0150   4a 77 d7 d9 90 7f da 3f ac c4 3c 2b 69 c5 b7 3f   Jw.....?..<+i..?
0160   ff 3d 78 ed d2 06 e3 3f c0 95 ec d8 08 c4 d3 bf   .=x....?........
0170   0f 5f 26 8a 90 ba c3 bf 27 d8 7f 9d 9b 36 cf 3f   ._&.....'....6.?
0180   00 00 00 00 00 00 00 00 06 00 00 00 74 63 70 5f   ............tcp_
0190   00 7c 42 dd 72 48 85 3f c8 cb 6c 1a 95 e2 95 3f   .|B.rH.?..l....?
01a0   80 c0 f4 2b ce 5f 67 bf 70 11 5f 30 96 ef ab bf   ...+._g.p._0....
01b0   52 d4 13 5d 16 6b bb bf 60 23 fe 89 ae 30 c8 3f   R..].k..`#...0.?
01c0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
01d0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
01e0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
01f0   00 00 00 00 00 00 00 00 00 00 00 00               ............

Note EA 26 00 45 at 0x000C.

Second (and last) fragment (74 bytes on-the-wire, 32 bytes XRCE-DDS protocol data):

Click to expand
0000   81 80 a1 ff 0d 03 18 00 00 00 00 00 00 00 00 00   ................
0010   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................

Wireshark dissection

This is the complete message, so reassembled by the dissector (including XRCE-DDS fields):

Click to expand
DDS-XRCE Protocol, sessionId: 0x81, streamId: 0x80
    sessionId: 0x81
    streamId: 0x80 (STREAMID_BUILTIN_RELIABLE)
    sequenceNr: 65441
    FRAGMENT
        submessageId: 13
        flags: 0x03, Last Fragment bit, Endianness bit
        submessageLength: 24
        payload: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    WRITE_DATA
        submessageId: 7
        flags: 0x01, FORMAT_DATA, Endianness bit
            .... 000. = DataFormat: FORMAT_DATA (0x0)
            .... ...1 = Endianness bit: Set (0x1)
        submessageLength: 520
        payload (WRITE_DATA_Payload_Data)
            request_id: ea26
            object_id: 0045 (DATAWRITER)
                [type: control_msgs::action::dds_::FollowJointTrajectory_FeedbackMessage_]
            data
                serialized_data (control_msgs::action::dds_::FollowJointTrajectory_FeedbackMessage_)
                    header:
                        stamp:
                            sec: 1668007281
                            nanosec: 526168749
                        frame_id: ""
                    joint_names:
                        [0] "joint_1_s"
                        [1] "joint_2_l"
                        [2] "joint_3_u"
                        [3] "joint_4_r"
                        [4] "joint_5_b"
                        [5] "joint_6_t"
                    desired:
                        positions:
                            [0] -1,1999857049738
                            [1] 0,068596917507449
                            [2] -0,44712790378662
                            [3] 0,31162109529608
                            [4] -0,32940515604674
                            [5] 1,7361430202205
                        velocities: [empty]
                        accelerations: [empty]
                        effort: [empty]
                        time_from_start:
                            sec: 0
                            nanosec: 0
                    actual:
                        positions:
                            [0] -1,2105646714545
                            [1] 0,046832779228507
                            [2] -0,44422443511923
                            [3] 0,36718954933722
                            [4] -0,2203395133497
                            [5] 1,5436851729772
                        velocities:
                            [0] 0,047995052581623
                            [1] 0,096865228629121
                            [2] 0,0122158897712
                            [3] 0,25132229901789
                            [4] 0,49261327676681
                            [5] 0,86786332700334
                        accelerations: [empty]
                        effort:
                            [0] 0,414036
                            [1] 0,092856
                            [2] 0,594583
                            [3] -0,30884
                            [4] -0,154131
                            [5] 0,243854
                        time_from_start:
                            sec: 0
                            nanosec: 0
                    error:
                        positions:
                            [0] 0,010392091157255
                            [1] 0,021372155893186
                            [2] -0,0028533007422525
                            [3] -0,054562276272494
                            [4] -0,10710277340316
                            [5] 0,18898565042425
                        velocities: [empty]
                        accelerations: [empty]
                        effort: [empty]
                        time_from_start:
                            sec: 0
                            nanosec: 0
                    multi_dof_joint_names: [empty]
                    multi_dof_desired:
                        transforms: [empty]
                        velocities: [empty]
                        accelerations: [empty]
                        time_from_start:
                            sec: 0
                            nanosec: 0
                    multi_dof_actual:
                        transforms: [empty]
                        velocities: [empty]
                        accelerations: [empty]
                        time_from_start:
                            sec: 0
                            nanosec: 0
                    multi_dof_error:
                        transforms: [empty]
                        velocities: [empty]
                        accelerations: [empty]
                        time_from_start:
                            sec: 0
                            nanosec: 0

Note EA26 in the request_id field, and 0045 in the object_id field.

  • What are the sizes of the serialized message

The complete message consists of two fragments (probably why you ask about the MTU below).

Wireshark tells me:

  • 550 bytes on-the-wire for the first fragment
  • 74 bytes on-the-wire for the second fragment
  • 508 bytes total XRCE-DDS payload for the first fragment
  • 32 bytes total XRCE-DDS payload for the second fragment
  • 500 bytes submessageLength for the first fragment
  • 24 bytes submessageLength for the second fragment
  • 520 bytes submessageLength for the complete message (ie: reassembled)

See also the dissection tree I include above.

your XRCE-DDS buffers (MTU)

Set to UCLIENT_UDP_TRANSPORT_MTU=512 bytes.

the number of XRCE buffers?

Set to RMW_UXRCE_STREAM_HISTORY=160.

  • Which transport are you using at the micro-ROS level?

UDPv4.

  • Are you using a multithreaded micro-ROS app?

Multi-tasked (same thing on this platform), but: yes.

gavanderhoorn avatar Nov 15 '22 09:11 gavanderhoorn

Oh and perhaps important/interesting to mention: the client doesn't have a feedback callback registered.

Given the fact the backtraces seem to point towards deserialisation failures, I'm not sure that's relevant/affects anything, but I thought I'd mention it in any case.

gavanderhoorn avatar Nov 15 '22 11:11 gavanderhoorn

RMW_UXRCE_STREAM_HISTORY=160 shall be a power of two. Maybe this is not very explicit at the RMW level (we will fix it), but it is in the XRCE-DDS documentation: look for uxr_create_output_reliable_stream here.

Probably this is why your XRCE headers are being mixed with the payload.

Also, I guess that you have micro-ROS multithread flags enabled, right?

pablogs9 avatar Nov 15 '22 11:11 pablogs9

RMW_UXRCE_STREAM_HISTORY=160 shall be a power of two.

Ok. That's something I'd missed apparently.

Maybe this is not very explicit at the RMW level (we will fix it)

yeah, I just checked and couldn't find it.

it is in the XRCE-DDS documentation: look for uxr_create_output_reliable_stream here.

True.

However, as a user of micro-ROS, I don't call that function anywhere, nor do I use the Client API directly myself. I only interact with the rmw_microxrcedds API. So I hadn't seen it.

Would perhaps be good to both check it and error out in the RMW and update the docs for these options in the CMakeLists.txt of rmw_microxrcedds_c?

I guess that you have micro-ROS multithread flags enabled, right?

Yes. UCLIENT_PROFILE_MULTITHREAD=ON.


Edit: we've also been using this value for about 6 months, and had not encountered any problems.

I'll test a build with a power-of-two value for RMW_UXRCE_STREAM_HISTORYand report back.

gavanderhoorn avatar Nov 15 '22 11:11 gavanderhoorn

And some bounds checking on deserialised values might also be nice, but I guess that's hard to do: in some cases a vector with 1157681529 elements will probably be perfectly OK.

(but this is not something specific to this RMW, nor micro-ROS)

gavanderhoorn avatar Nov 15 '22 11:11 gavanderhoorn

Yes, I agree that this shall be written somewhere, but it is not common for micro-ROS users to modify it and we have had no bandwidth to take care of it.

pablogs9 avatar Nov 15 '22 11:11 pablogs9

I can PR the CMake options docs. The Client would take me longer.

gavanderhoorn avatar Nov 15 '22 11:11 gavanderhoorn

I can PR the CMake options docs. The Client would take me longer.

Appreciate it

pablogs9 avatar Nov 15 '22 11:11 pablogs9

https://github.com/micro-ROS/rmw_microxrcedds/pull/270

gavanderhoorn avatar Nov 15 '22 11:11 gavanderhoorn

Closing as the problem has not come back after changing STREAM_HISTORY to be a power-of-two.

Thanks for the fast response @pablogs9, even if this in the end was not an issue with rmw_fastrtps_cpp.

gavanderhoorn avatar Nov 24 '22 16:11 gavanderhoorn

In any case, IMO this behavior of rmw_fastrtps_cpp or type support shall be fixed.

pablogs9 avatar Nov 25 '22 06:11 pablogs9