js-ceramic icon indicating copy to clipboard operation
js-ceramic copied to clipboard

feat: add ability to publish metrics as ceramic model instance documents (rebased)

Open gvelez17 opened this issue 1 year ago • 7 comments

Description

This PR enables publishing metrics to the ceramic network as model instance documents. It uses the same count and observe semantics as ServiceMetrics, with a configurable interval for publishing.

Key Design notes

  • avoids circular dependencies. the observability model-metrics package accepts a ceramic api instance at run time, so never imports any js-ceramic common packages.

  • singleton pattern. because of the instance import and that its instantiated as a singleton, you can import model-metrics from a code path where the ceramic instance is not accessible, and still use it; also no need to pass it around, just import

  • lightweight functions. the count observe and recordError functions are very lightweight, and only update variables in memory. At the configured interval time, a summary will be published as a model metric.

These designs are in the model-metrics package in the observability repo which is a dependency of this PR. The code is separate from js-ceramic but the npm package is published under @ceramicnetwork/model-metrics

How Has This Been Tested?

Describe the tests that you ran to verify your changes. Provide instructions for reproduction.

  • [x ] Tested manually with a local node on dev-unstable, verified that metrics are correctly produced.

To reproduce this test, add the following into daemon.config.json and run your node.

"metrics": {
    "metrics-publisher-enabled": true
  }
  • [ x] Tested Manually with a harness at https://github.com/3box/runbooks-and-oncall/tree/main/MetricsDemo (see readme for instructions)

PR checklist

Before submitting this PR, please make sure:

  • [x] I have tagged the relevant reviewers and interested parties
  • [ ] I have updated the READMEs of affected packages
  • [ ] I have made corresponding changes to the documentation

References:

Please list relevant documentation (e.g. tech specs, articles, related work etc.) relevant to this change, and note if the documentation has been updated.

gvelez17 avatar Mar 20 '24 23:03 gvelez17

For easy reference, here is a sample of the metrics currently produced:

{
  "streamId": "kjzl6kcym7w8y6ghqfmaqe9m94obsmoekktwdnblu1u7fvhcljvcvnmqhvd4a4m",
  "state": {
    "type": 3,
    "content": {
      "ts": "2024-03-22T15:02:19.384Z",
      "ceramicNode": {
        "id": "CAESIHWY0nyXqZTZO2as2Dli/USgf+TRcYTuljaiVc1rifhV",
        "name": "0.0.0.0",
        "PeerID": "CAESIHWY0nyXqZTZO2as2Dli/USgf+TRcYTuljaiVc1rifhV",
        "IPAddress": "0.0.0.0",
        "ipfsVersion": "0.18.1",
        "nodeAuthDID": "did:key:z6Mkqot5TB3v7kd7UjezicHrnHvUE7ikN8b763xpE3U9gR6d",
        "ceramicVersion": "5.5.0-rc.0"
      },
      "recentErrors": 0,
      "sampleRecentErrors": [],
      "totalIndexedModels": 0,
      "totalPinnedStreams": 0,
      "maxAnchorRequestAgeMS": 706236,
      "currentPendingRequests": 0,
      "meanAnchorRequestAgeMS": 648052,
      "recentCompletedRequests": 3
    },
    "metadata": {
      "controllers": [
        "did:key:z6Mkqot5TB3v7kd7UjezicHrnHvUE7ikN8b763xpE3U9gR6d"
      ],
      "model": "kjzl6hvfrbw6c7kjl8p225jzy0mjxqo5xdcdt7eqsjv0519arqfda598zfgpfx9",
      "unique": "bwOxAanX3RqNTWMc"
    },
    "signature": 2,
    "anchorStatus": "PENDING",
    "log": [
      {
        "cid": "bagcqcerah4z6t76dsy44xohfhnz2465dth3sgpvsbwxgsarhhc6mbfqouvda",
        "type": 0
      }
    ],
    "doctype": "MID"
  }
}

The code that produces them is in https://github.com/ceramicnetwork/observability/tree/master/packages/model-metrics/src

which is included as @ceramicnetwork/model-metrics

Note that it uses a singleton pattern to avoid passing around a pointer to the ceramic api and metrics instance.

gvelez17 avatar Mar 22 '24 16:03 gvelez17

Note that 'totalPinnedStreams' is part of the schema, but is not currently being recorded as we do not have a way to get this out of leveldb afaik.

'0' values are currently also 'no data' values, we could change that to '-1' or something clearer

gvelez17 avatar Mar 22 '24 17:03 gvelez17

IPAddress just has the value of 0.0.0.0 currently, which is likely what was given as the arg for the process to bind to, but isn't actually the node's real public IP address. Can we get the node's actual public IP? This may need to be changed to a list since one node can have multiple IPs if it's listening to multiple networks.

Alternatively, we may just want to remove this field entirely. This actually may fall under the category of information that devs wont want to be published publicly

stbrody avatar Mar 22 '24 17:03 stbrody

published

I agree collecting IP is generally considered sensitive information. We can use PeerId to uniquely identify how many nodes are on the network etc without collecting which IP they come from.

nathanielc avatar Mar 22 '24 18:03 nathanielc

'0' values are currently also 'no data' values, we could change that to '-1' or something clearer

How about null to mean "no data"? I do think we will probably have semantically meaningful 0 values, so would be good to use something else to mean no data.

stbrody avatar Mar 22 '24 18:03 stbrody

published

I agree collecting IP is generally considered sensitive information. We can use PeerId to uniquely identify how many nodes are on the network etc without collecting which IP they come from.

agreed, tho having a public endpoint for the node if there is one could be super valuable to have - this could be optional public endpoint?

gvelez17 avatar Mar 24 '24 23:03 gvelez17

Overall I like the design. Its simple and straightforward. I have some questions below.

Also as an alternative approach if we wanted to make this more generic for any kinds of metrics instead of Ceramic specific metrics.

OpenTelemetry defines a specific structure for metric data. See their protobuf definitions. We don't need all the complexity represented in the protobuf but the general design would be to define some models like:

interface MetricEvent
  @createModel(description: "Required metric event content interface") {
  ts: DateTime!
  metrics: [Metric]
}

type Metric {
    attributes: [Attribute]
    name: String
    value: Float  
}

type Attribute {
   key: String
   value: String
}

The idea here is you can have an arbitrary set of metrics without ever having to change the schema. The downside is its not guaranteed at the schema level which metrics are present. However generally this works well in practice as conventions arise around which metrics are useful/valuable.

To be clear, I think this PR can/should merge as is (after address all feedback) but we should consider this design when we want to generalize this workflow.

ok 100%, I would love to do this as version 2

I did have a generic blob one called GenericMetricEvent here https://github.com/ceramicnetwork/observability/blob/master/packages/model-metrics/src/composites/00-simpleNodeMetrics.graphql#L18 but went with an explicit schema for this round

Captured in https://linear.app/3boxlabs/issue/WS3-568/generalize-metrics-to-protobuf-like-schema

gvelez17 avatar Mar 25 '24 20:03 gvelez17

hey @stbrody i think this and the observability change address the two things we agreed to change for this iteration; one question came up i referred to it here - https://github.com/ceramicnetwork/observability/pull/25/files - about whether we should be using the same model instance id across networks, or not.

gvelez17 avatar Mar 27 '24 22:03 gvelez17