Get node usage stats from kubernetes resource-metrics-pipeline in usage-based-scheduling
What would you like to be added:
Support another way to get nodes' CPU and Memory usage stats by kubernetes resource-metrics-pipeline in usage-based-scheduling.
Why is this needed:
Like HPA, getting node usage infos from kubernetes resource-metrics-pipeline is more efficient and will get better real-time performance.
And I think that the feature Support Rescheduling Based on Real Node Load needs better real-time node stats of nodes and pods.
If the feature is reasonable, I can help to do this, thank you.
Thanks for your advice! IMO, It is another way to get metrics of nodes, which is native for Kubernetes. Just as the Resource metrics pipeline says in the Note section, the metrics APIs only offers the minimum CPU and memory metrics to enable automatic scaling using HPA and / or VPA on codition that the metrics pipeline is enabled. So I wonder how to solve the following problems:
- What if the rescheduling feature is configured while the pipline is not enable?
- How to expand the custom metrics if users have some specified scenarios?
- What if the rescheduling feature is configured while the pipline is not enable?
- How to expand the custom metrics if users have some specified scenarios?
- For question 1, we check scheduler-config when scheduler start. If the rescheduling feature is configured and the metrics datasource is from pipeline, try to request "apis/metrics.k8s.io/v1beta1/nodes" to check the pipline is enable whether or not. Makesure the pipline is enable, otherwise throw a panic error.
- For question 2, users who have some specified custom metrics should deploy and config a
prometheus-adapter.The monitoring pipeline fetches metrics from the kubelet and then exposes them to Kubernetes via an adapter by implementing either thecustom.metrics.k8s.ioorexternal.metrics.k8s.ioAPI. For example, for DCGM-exporter metrics, we could deploy and config follow this guide
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗