Azure_Kinect_ROS_Driver icon indicating copy to clipboard operation
Azure_Kinect_ROS_Driver copied to clipboard

Potential Bug - Profile Performance of using Sleep in the frame loop and its effects on the longterm frame stability

Open jmachowinski opened this issue 5 years ago • 5 comments

Hi, I'm confused about some parts of the code :

https://github.com/microsoft/Azure_Kinect_ROS_Driver/blob/01b5a781f499b32d2babed97a534537b954e0d69/src/k4a_ros_device.cpp#L1181

Why is this done here ? AFAIK there is already the spin in the main thread. Also I do not know it if it is even allowed to call spin from multiple threads.

https://github.com/microsoft/Azure_Kinect_ROS_Driver/blob/01b5a781f499b32d2babed97a534537b954e0d69/src/k4a_ros_device.cpp#L1182

This rate looks kind of dangerous to me. If for some reason, the rate on the sensor would be a bit faster than the PC time, we would start accumulate a delay and if run very long, start to pile up measurements. I assume, this rate should only be present in the code path of the playback ?

Any clarification would be appreciated :-)

jmachowinski avatar Sep 18 '20 15:09 jmachowinski

I am not that expirienced with multithreaded nodes. What would be the correct way to handle the spinning? Creating an asyncspinner and start it in the mainthread and then wait there for shutdown?

RoseFlunder avatar Sep 18 '20 16:09 RoseFlunder

Thank you for the report. A few questions:

  • Are you observing a specific issue related to this? such as an unexplained crash or frame backup?

Discussion: ROS::spin internally services the global callback queue, which itself is thread safe. However, it will dispatch a callback on the thread which is calling spin. This means that the handlers of the callback need to be thread safe and reentrant.

The Main thread is sitting in a spin loop, servicing callbacks - so the global callback queue is being serviced from two different places.

Since the main thread is servicing callbacks, I think that the spin in frame publish is redundant, but I don't necessarily see a problem with it.

I think the rate throttle logic can be optimized - It should only sleep for the remainder of the FPS, not the full timeout.

Action:

  • Sleep the remainder of the FPS delta (Opened separate issue): https://github.com/microsoft/Azure_Kinect_ROS_Driver/issues/150
  • Investigate 'threadiness' of spin.

Thanks for finding this

ooeygui avatar Sep 18 '20 19:09 ooeygui

@ooeygui Actually ros::Rate::sleep() already accounts for the processing overhead and sleeps for only the leftover of the sampling period.

bool sleep() Sleeps for any leftover time in a cycle. Calculated from the last time sleep, reset, or the constructor was called.

I believe the original poster's concern with loop_rate.sleep() is that if PC clock somehow runs a bit slower than the sensor clock, there would be no chance for this loop to catch up with the data being produced by the sensor in the long run and/or maintain a minimum delay for the message publication.

zchenpds avatar Sep 18 '20 20:09 zchenpds

@zhenpds thanks for pointing that out. I need to profile this portion of the code and ensure it is meeting the publish rate.

ooeygui avatar Sep 18 '20 21:09 ooeygui

So the question is do we really need a sleep at all if we are not in Playback mode? The "getCapture" could be the pseudo sleep because it is called with infinite timeout, thus it will block until the next frame from the camera is available right?

RoseFlunder avatar Sep 19 '20 02:09 RoseFlunder