[Question] Is it possible to provide a time_since_last_update metric in xDS subscription statistics?
I am aware of the update_time metric. My team wants to use this metric to observe if instances are receiving xDS updates in a timely fashion in our fleet and this is especially helpful in gray failure scenarios. What we want to alarm is really time_since_last_update.
However update_time being an epoch timestamp is complicating monitoring/alerting for us. I would like to think this may be a complication in other alerting system as well. For instance:
- We are able to monitor computed metrics in our system, however the system does not support a System.now() metric, so we cannot do a
update_time- System.now() computation. - After a node goes down, our metrics system cache the gauges it last reported and they continue to appear on the timeline for a while. So, it is difficult to distinguish nodes going down from nodes going stale for xDS updates due to other reasons. If we had a
time_since_last_updatemetric instead, this wouldn't be a problem.
My main question is - does this metric look reasonable to you, or is there a reason this was not provided in the first place?
@kyessenov since you labeled this issue - does this mean it is something that's doable and there's not a strong reason against it(in terms of current design/flow) ? Knowing that would help, and perhaps we can contribute.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Envoy metrics have evolved over time and don't necessarily follow best modern practices. I think it's reasonable to propose usability improvements in the area of xDS status monitoring.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
I am working on this.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.