public icon indicating copy to clipboard operation
public copied to clipboard

Add health variables to components model.

Open robshakir opened this issue 4 years ago • 2 comments

commit 54c8618962211fe8334c45d531f79c1928689070
Author: Rob Shakir <[email protected]>
Date:   Sat Oct 16 09:19:36 2021 -0700

    Add telemetry-on-change annotation to health variables.
    
     * (M) release/yang/platform/openconfig-platform.yang
      - Add extension indicating that the health variables should be
        sent based on when they change.

commit 987f504239beecf7e6081f9ff276afd1382b1628
Author: Rob Shakir <[email protected]>
Date:   Sat Oct 16 09:17:06 2021 -0700

    Add component health indicators.
    
     * (M) release/yang/platform/openconfig-platform.yang
      - Per the definition of the gnoi.Healthz service in
        github.com/openconfig/gnoi - indicate whether a component is
        healthy or unhealthy following a check on the system. This
        allows the system to signal to external subscribers that the
        component is not in a healthy state such that further
        actions can be taken.

See the gnoi.Healthz documentation for discussion of how these variables interact with the healthz service.

robshakir avatar Oct 16 '21 16:10 robshakir

@marcushines @sulrich - PTAL

robshakir avatar Oct 16 '21 16:10 robshakir

Compatibility Report for commit ad306ac34dadb8d24a357983a1933d94e05a70f5: ⛔ yanglint@SO 1.10.17

OpenConfigBot avatar Oct 16 '21 16:10 OpenConfigBot

No major YANG version changes in commit ad306ac34dadb8d24a357983a1933d94e05a70f5

OpenConfigBot avatar Jan 23 '23 20:01 OpenConfigBot

Minor suggestion here, but the platform model is getting large. The health variables seem like they could be placed in a separate file and augment component state? WDYT?

dplore avatar Jan 25 '23 01:01 dplore

Sure, that's fine with me. I don't think this materially changes anything.

I still think that we should cut 1.0.0 of the openconfig-platform model though. I can do this in a separate PR.

robshakir avatar Jan 25 '23 01:01 robshakir

Sure, that's fine with me. I don't think this materially changes anything.

I still think that we should cut 1.0.0 of the openconfig-platform model though. I can do this in a separate PR.

Agreed on all points. The benefits are it makes for a smaller file and modularizes the models so implementations have another option / cleaner way to express support on a per module (file) basis.

dplore avatar Jan 25 '23 01:01 dplore

@dplore - done. Can we establish an expected merge date for this PR please?

robshakir avatar Jan 25 '23 02:01 robshakir

This was presented at the openconfig operators meeting today with no specific comments for or against.

There was extensive discussion on the topic of alarms, thresholds, and observation that healthz is also an alarm mechanism (a tree of boolean values indicating health of an entity). This was inspired by several PR's which are related to thresholds and alarms. We observe there are several different patterns for alarms/health indication from booleans and thresholds embedded as leaves within the relevant component to the generic (but opaque) /system/alarms model. Each of these approaches has their own tradeoffs.

OpenConfig could benefit from a more opinionated view on how alarms and thresholds should be implemented so we have a more predictable and consistent pattern. However until we have rough consensus on a pattern to follow, I don't see a reason to block development.

Therefore I will ask for last call for comments for this PR, to be merged in 2 weeks on Feb 7, 2023

dplore avatar Jan 25 '23 05:01 dplore

/gcbrun

dplore avatar Feb 08 '23 17:02 dplore