load-watcher
load-watcher copied to clipboard
Load watcher is a cluster-wide aggregator of metrics, developed for Trimaran: Real Load Aware Scheduler in Kubernetes.
version: load-watcher v0.2.3 prometheus v2.50.1 curl 10.105.174.136:2020/watcher  Self-monitored data   [https://github.com/paypal/load-watcher/issues/51](url) This suggestion should be open to consideration
Hi all, I found this line in README: `kubectl create -f manifests/load-watcher-deployment.yaml` but I did not find the `manifests/load-watcher-deployment.yaml` in repo. Maybe a sample deployment file is needed? thanks.
Resolve go: unsupported GOOS/GOARCH pair linux/aarch64
* allow users to change filter keys (for host & cluster name) in signalfx * use pointers instead of values for metrics clients. TEST DONE: * verify metrics pulled to...
The current load-watcher Prometheus pkg was using the metric of `instance:node_cpu:ratio` to calculate the node utilization However, when this value is still below 60%, I found another metric `instance:node_cpu_utilisation:rate1m` was...
It took me some time to find out what exactly `instance:node_cpu:ratio` metirc is. It seems cpu and memory metric is come from [helm-charts/charts/kube-prometheus-stack/templates/prometheus/rules/kube-prometheus-node-recording.rules.yaml](https://github.com/prometheus-community/helm-charts/blob/c4a7d10fdc6a0f694d9b97e9446207ba67d997dd/charts/kube-prometheus-stack/templates/prometheus/rules/kube-prometheus-node-recording.rules.yaml) rule which is is removed and seems...
Currently, it is confusing to know which load-watcher version is compatible with which kube-scheduler/scheduler-plugins version. We should have a table to declare the release compatibility.
Currently, no tests exist for each metric provider. These need to be added for code coverage and resilient clients.
It will be nice to have contribution guidelines defined. Also, a script to check basic code formatting issues will save time in PR reviews and avoid unintentional overlooks. This can...
time="2025-01-12T05:05:16Z" level=error msg="received error while fetching metrics: nodes is forbidden: User \"system:serviceaccount:loadwatcher:default\" cannot list resource \"nodes\" in API group \"\" at the cluster scope" func="github.com/paypal/load-watcher/pkg/watcher.(*Watcher).StartWatching.func1" file="/go/src/github.com/paypal/load-watcher/pkg/watcher/watcher.go:136" time="2025-01-12T05:05:16Z" level=error msg="received error...