librealsense icon indicating copy to clipboard operation
librealsense copied to clipboard

add heartbeat-period and default to half reply timeout

Open maloel opened this issue 1 year ago • 0 comments

We encountered rare cases where the camera seems to not receive a control sent from the host. Even worse, the control message (on a reliable topic) is asked to be resent.

Investigation showed that, on rare cases, it is possible for the control message to not be "announced" to remote participants in time. This announcement uses the heartbeat mechanism: if a heartbeat is not sent by our timeout (either on the host or on the camera), then we get an error get into a bad state.

We have a reply timeout of 2000ms (by default) for every control message. Digging into the QoS for the control writer, we found a way to control the heartbeat period, and saw that the default period is set to 3 seconds (3000ms). So it is possible for the timeout to fit within the heartbeat period: i.e., before a heartbeat is even sent out, we time out!

  • Increase our timeout default to 2500ms
  • Use a heartbeat period that's one half of our timeout
  • Allow overriding the control/heartbeat-period value

Tracked on [RSDEV-2841]

maloel avatar Oct 20 '24 07:10 maloel