/ping may be affected by hanging RTSP processes — could impact overall core responsiveness

Open AlexeyBoiler opened this issue 6 months ago • 1 comments

Hi,

We’ve observed an issue where the /ping endpoint of datarhei/core becomes unavailable or responds with ECONNRESET, but only in cases when one of the RTSP camera processes is stuck or unreachable.

What we’re seeing • When a camera drops (e.g., offline, frozen, or rebooting), the corresponding process in core starts hanging. • At that same moment, /ping — which usually responds instantly - starts timing out or returning read ECONNRESET. • This is being used as a health check from external systems like n8n, and it makes the instance look unhealthy. • It seems like this only happens when the camera is down; otherwise, core works flawlessly.

Our assumptions and concerns

We suspect that /ping might be internally referencing or waiting on information from active processes — even if they’re stuck waiting for an RTSP source to respond. If this is the case, then:

A single faulty RTSP input could degrade or even block the responsiveness of the entire core instance. We have observed a recurring issue when using UDP push sources with MPEG-TS streams.

This could become a hidden bottleneck or even a denial-of-service vector if multiple processes are misbehaving or unresponsive.

-	We rely on /ping as a lightweight health check - if it fails, our automation (like camera monitors, process supervisors, etc.) assumes core is down.
-	Even though core is alive and trying to recover, the failed /ping causes upstream workflows to crash or retry.
-	This creates system-wide instability, despite the failure being isolated to one RTSP process.

Log: ts=2025-08-06T06:55:59Z level=INFO component="Process" msg="Started" id="core2_sub-push" ts=2025-08-06T06:56:00Z level=INFO component="Session" msg="Closed" id="HTTP" location="any" peer="any" reference="" rx_bitrate_kbit=0 rx_bytes=556 rx_maxbitrate_kbit=0.3229166666666667 tx_bitrate_kbit=0 tx_bytes=19282 tx_maxbitrate_kbit=0 type="http" ts=2025-08-06T06:56:00Z level=INFO component="Process" msg="Failed" id="core2_sub-push" ts=2025-08-06T06:56:00Z level=INFO component="Process" msg="Stopped" id="core2_sub-push"

Aug 06 '25 07:08 AlexeyBoiler

After switching the stream source to a stable RTSP camera, we’ve confirmed that: • The stream is working without interruptions • There are no visible crashes or errors in Core logs • Core is reachable and streaming normally

However, we are still intermittently getting the following error from an external service (n8n) when performing a simple GET /ping request:

AxiosError: timeout of 50000ms exceeded Code: ECONNABORTED

This suggests that even when the stream is healthy and logs show no errors, the Core API occasionally becomes unresponsive or extremely slow to respond to basic HTTP requests.

Log: ts=2025-08-06T09:05:20Z level=INFO component="Session" msg="Active" id="HTTP" location="any" peer="any" reference="" type="http" ts=2025-08-06T09:06:03Z level=INFO component="Session" msg="Closed" id="HTTP" location="any" peer="any" reference="" rx_bitrate_kbit=0 rx_bytes=556 rx_maxbitrate_kbit=0.14375 tx_bitrate_kbit=0 tx_bytes=22801 tx_maxbitrate_kbit=17.81328125 type="http" ts=2025-08-06T09:10:20Z level=INFO component="Session" msg="Active" id="HTTP" location="any" peer="any" reference="" type="http" ts=2025-08-06T09:11:03Z level=INFO component="Session" msg="Closed" id="HTTP" location="any" peer="any" reference="" rx_bitrate_kbit=0 rx_bytes=556 rx_maxbitrate_kbit=0.14375 tx_bitrate_kbit=0 tx_bytes=23341 tx_maxbitrate_kbit=18.23515625 type="http" ts=2025-08-06T09:15:33Z level=INFO component="Session" msg="Active" id="HTTP" location="any" peer="any" reference="" type="http" ts=2025-08-06T09:16:03Z level=INFO component="Session" msg="Closed" id="HTTP" location="any" peer="any" reference="" rx_bitrate_kbit=0 rx_bytes=184 rx_maxbitrate_kbit=0 tx_bitrate_kbit=0 tx_bytes=0 tx_maxbitrate_kbit=0 type="http"

Aug 06 '25 09:08 AlexeyBoiler