rclcpp icon indicating copy to clipboard operation
rclcpp copied to clipboard

Logging is kind of slow, even at levels below the level selected for the node.

Open jpace121 opened this issue 6 months ago • 4 comments

I have a minimal reproducible example here.

In that example, I start a timer that only runs once that does a calculation in a for loop, logging as it goes. I measure the time it takes to do the calculation and also log that at the end. I run the process with the logger level for the node set to WARN, so the INFO statements aren't printed to the terminal or published.

I would expect that I would see a minimal slow down compared to not logging at all (ignoring missing optimizations), when in reality I see quite a large impact.

On my laptop, for the scenario described above I'm seeing the process take 0.0149388s to complete, where the equivalent code in ROS1 completed the process in 0.000248744s.

FWIW, its not just that logging is slower in ros2 than ros1. When I set the log levels so I can see the statements, ros1 finishes in 1.04641s compared to ros2 at 0.895799s. If I subscribe to rosout, the time for ros2 doubles to 1.62014s, though.

This seems unfortunate? I discovered this when porting some code from ros1 to ros2 that had log statements in some critical sections of the code, and wound up fixing the issue by using spdlog directly for those sections.

The fact that subscribing to /rosout has such a big effect is even more unfortunate, in my opinion. It is an additional place that people coming from ros1 don't expect counts as making a network call that really is when doing subsystem testing and verification.

Is there a way this can be improved? The ros1 behavior seems much better

jpace121 avatar Aug 08 '25 12:08 jpace121

This issue has been mentioned on Open Robotics Discourse. There might be relevant details there:

https://discourse.openrobotics.org/t/ros-pmc-minutes-for-august-12-2025/49548/1

ros-discourse avatar Aug 12 '25 21:08 ros-discourse

IMO, there are some more things could be related to the logging performance. Most of major parts seemed already discussed above, but the followings could affect it too.

  • publishing mode is dependent on rmw implementation that the application uses. (AFAIK, rmw_fastrtps uses async publish mode in default, which can be configured environmental variable.) so the queue and buffering mechanism can be already used with some rmw implementations. (i believe that rmw_cyclonedds only supports the synchronous publish mode.)
  • it accesses the hash table to see the logging level to filter out the message on every single time it logs the message. this could be the scalability problem. the more log levels we have, the more complexity to handle the hash table. probably we can enhance the efficiency to notify the logger if it needs to access the hash table on the write operation. i am assuming that most likely it does not even need to access the hash table to log the message.

fujitatomoya avatar Aug 13 '25 02:08 fujitatomoya

@fujitatomoya We discussed this shortly in the PMC meeting yesterday. A few questions came up :

  • should be have a log queue and a second thread for publishing by default ?
  • Should we modify the log msg to allow a array msg
  • Should we reduce the frequency of the publishing and use the array msg

What is your opinion on this ?

jmachowinski avatar Aug 13 '25 08:08 jmachowinski

sorry for being late to get back to this.

i think that is reasonable and useful to have log queue mechanism for rosout to address this issue.

there would be more configurable options for user about rosout including async thread, it can obviously reduce the non-cost effective API call to the rmw implementation to publish the data one by one. (this is different from rmw sync/async mode, and still beneficial for some cases.)

i would say downside could be, rosout subscriber could not receive the log messages in time series. this becomes just likely depending on user configuration for caching period. but this is not even guaranteed by current design either, i believe that if rmw implementation uses the async publisher mode, this could still happen with current implementation.

to be honest, we would not use rosout for the production environment, probably disable that. rosout topic can probably work with standalone single instance ROS 2 system, but once it comes to distributed system like 10-100 systems in ROS 2 network there will be scalability issue for sure. for that, we probably use external handler to handle the logging information including step server, log sink(file, pipe or network) based on log level.

fujitatomoya avatar Aug 18 '25 09:08 fujitatomoya