ros2clifecycle_set and ros2service_call_change_state in lifecycle management is too slow.
Bug report
Required Info:
- Operating System:
- ubuntu22.04
- Installation type:
- binary
- Version or commit hash:
- Humble
- DDS implementation:
- rmw_fastrtps_cpp
- Client library (if applicable):
- ros2cli
Steps to reproduce issue
the simple lifecycle node can be seen in https://github.com/micro-ROS/system_modes/tree/master/system_modes_examples
time ros2 lifecycle set /drive_base activate
Transitioning successful
real 0m2.030s
user 0m0.334s
sys 0m0.113s
time ros2 lifecycle set /drive_base deactivate
Transitioning successful
real 0m2.089s
user 0m0.419s
sys 0m0.083s
time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m2.128s
user 0m0.379s
sys 0m0.082s
time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m2.776s
user 0m0.343s
sys 0m0.096s
Additional information
repeated test of the same cmd can make it quicker, see the complete test log below
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m2.128s
user 0m0.379s
sys 0m0.082s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=False)
real 0m1.231s
user 0m0.536s
sys 0m0.117s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m2.776s
user 0m0.343s
sys 0m0.096s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m1.017s
user 0m0.354s
sys 0m0.075s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m2.737s
user 0m0.560s
sys 0m0.113s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real 0m3.112s
user 0m0.370s
sys 0m0.076s
zs@zs-vm-2204:~/zs_ws/ga_ros$
Thanks.
May be the right way to manage lifecycle of a node is to write cpp lifecycle manager client?
May be the right way to manage lifecycle of a node is to write cpp lifecycle manager client?
I am not sure if this is right for everyone, it depends on the use case and requirement.
I am using 1th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz, and lifecycle_talker(w/o this line) can respond approximately 1 secnod.
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker activate
Transitioning successful
real 0m1.034s
user 0m0.592s
sys 0m0.976s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker deactivate
Transitioning successful
real 0m1.030s
user 0m0.547s
sys 0m0.997s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker activate
Transitioning successful
real 0m1.043s
user 0m0.583s
sys 0m0.916s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker deactivate
Transitioning successful
real 0m1.040s
user 0m0.588s
sys 0m0.971s
I do not really know the requirement or expectation in performance perspective for this issue, but 1 second is reasonable for me. And this will also depends on the network configuration and node graph.
the reason that ros2 lifecycle set takes time is that it does many things under the hood.
- get all node name from node graph to make sure that specified node exists in the graph.
- get available transitions for that node to make sure user request is appropriate.
- call change_state service for that node, and wait for the response.
@fujitatomoya thanks for your quick reply.
I do not really know the requirement or expectation in performance perspective for this issue, but 1 second is reasonable for me.
My use case is that a manager of state machine would start a thread executing a launch.py to configure, activate, and deactivate some lifecycle nodes(with executing process ros2 lifecycle set), and in my case, this thread takes around 7-10 sesonds to finish the management of 6 lifecycle nodes, if run into error, the launch thread even stucks and cannot return.
see https://github.com/robosoft-ai/SMACC2/issues/330 if you like
And the launch thread often run into the error with node not found, see #582 and you've already known
the reason that ros2 lifecycle set takes time is that it does many things under the hood.
- get all node name from node graph to make sure that specified node exists in the graph.
- get available transitions for that node to make sure user request is appropriate.
- call change_state service for that node, and wait for the response.
I see,
- I think all cmds of ros2cli depend on the node graph, or at least
ros2 lifecycle set,ros2 service calland other cmds taking operations on nodes, how about directly writing rclcpp service clients to call service in cpp? Do you think it will bypass thenode listissue? - I think
ros2 service call /node/change_statewould be quicker thanros2 lifecycle set, but there is no evidence in my test log, see logs in the top comment.
It has been tested that the rclcpp::client<lifecycle_msgs::srv::ChangeState> is faster enough to manage lifecycle of the node than ros2 lifecycle set and ros2 service call /node/change_state
There may be a system pitfall in the ros2cli and/or rclpy.
how about directly writing rclcpp service clients to call service in cpp?
This could be quicker for initialization, and i think sometimes we do this kind of init performance improvement like replacing scripts into compiled code. (e.g replace bash script into C compiled binary.) probably this is something makes your performance requirement.
but i am not inclined to take this path for mainline, since this will increases burden for code maintenance. I think current ros2cli is really flexible to insert the 3rd party command and maintain.
Do you think it will bypass the node list issue?
No i do not think so, it is not about how fast it is, but where it is implemented. Node graph comes from underlying rmw implementation, so this will not bypass the node list issue, i believe.
There may be a system pitfall in the ros2cli and/or rclpy.
it just takes some time to check all the arguments are correct, such as node, topic and service names.
CC: @clalancette @wjwwood @ivanpauno
No i do not think so, it is not about how fast it is, but where it is implemented. Node graph comes from underlying rmw implementation, so this will not bypass the node list issue, i believe.
from https://github.com/ros2/ros2cli/issues/582#issuecomment-776539070
Since rclcpp doesn't depend on daemon(graph cache), if rclcpp service client will bypass the issue(evidence from my test in https://github.com/ros2/ros2cli/issues/779#issuecomment-1315117834), the root cause would be in the daemon-graph-cache rather than rmw-graph.
Since rclcpp doesn't depend on daemon(graph cache)
neither rclpy. ros2cli has NodeStrategy to rely on ros2 daemon to reduce the time to get the connectivity map, but this can be opt-out.
the root cause would be in the daemon-graph-cache rather than rmw-graph.
i guess that this could be, it would be worth to try.