Design telemetry approach for Juno
Create a design for telemetry capability in Juno notes, including:
- Consider libraries and utilities
- Ensure "opt-in"is available for Juno users
- Take into consideration the future p2p network monitoring capability
-
Use OpenTelemetry architecture (https://opentelemetry.io/docs/what-is-opentelemetry/)
- Design "signals" for Juno
- Traces
- all RPC calls???
- P2P operations???
- P2P message handler execution?
- ...consider (estimate) the volume of telemetry data!!!
- Metrics
- ...list needed...
- ...skip Logs
- Traces
- Design "signals" for Juno
-
Consider "passive" vs "active"
- "passive" - Juno exposes OT endpoints and can be instrumented by an external OT Collector agent (eg. Prometheus)
- "active" - Juno actively uploads telemetry data to a designated collector endpoint (eg. see https://github.com/open-telemetry/opentelemetry-go/tree/main/example/otel-collector)
-
Implement OT instrumentation framework in Juno
- https://opentelemetry.io/docs/instrumentation/go/
-
Implement 1st iteration of traces and metrics
-
Measure the performance impact
- Need to test the node RPC performance with and without telemetry to quantify the infrastructure increase required to host RPC SaaS
-
Design the OT Collector infrastructure
- Collector agent or
- Collector backend instance
-
Implement the OT Collector infrastructure
-
Implement Grafana integration for telemetry data
-
Develop Grafana dashboards
Also take into account the recent discussion on "quick telemetry" as discussed in Brussels: https://github.com/NethermindEth/starknet-node-data-spec
This issue is stale because it has been open 35 days with no activity. Remove stale label or comment or this will be closed in 14 days.