volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Get node usage stats from kubernetes resource-metrics-pipeline in usage-based-scheduling

Open chanhz opened this issue 3 years ago • 3 comments

What would you like to be added:

Support another way to get nodes' CPU and Memory usage stats by kubernetes resource-metrics-pipeline in usage-based-scheduling.

Why is this needed:

Like HPA, getting node usage infos from kubernetes resource-metrics-pipeline is more efficient and will get better real-time performance.

And I think that the feature Support Rescheduling Based on Real Node Load needs better real-time node stats of nodes and pods.

chanhz avatar Jul 19 '22 04:07 chanhz

If the feature is reasonable, I can help to do this, thank you.

chanhz avatar Jul 19 '22 05:07 chanhz

Thanks for your advice! IMO, It is another way to get metrics of nodes, which is native for Kubernetes. Just as the Resource metrics pipeline says in the Note section, the metrics APIs only offers the minimum CPU and memory metrics to enable automatic scaling using HPA and / or VPA on codition that the metrics pipeline is enabled. So I wonder how to solve the following problems:

  • What if the rescheduling feature is configured while the pipline is not enable?
  • How to expand the custom metrics if users have some specified scenarios?

Thor-wl avatar Jul 19 '22 07:07 Thor-wl

  • What if the rescheduling feature is configured while the pipline is not enable?
  • How to expand the custom metrics if users have some specified scenarios?
  • For question 1, we check scheduler-config when scheduler start. If the rescheduling feature is configured and the metrics datasource is from pipeline, try to request "apis/metrics.k8s.io/v1beta1/nodes" to check the pipline is enable whether or not. Makesure the pipline is enable, otherwise throw a panic error.
  • For question 2, users who have some specified custom metrics should deploy and config a prometheus-adapter .The monitoring pipeline fetches metrics from the kubelet and then exposes them to Kubernetes via an adapter by implementing either the custom.metrics.k8s.io or external.metrics.k8s.io API. For example, for DCGM-exporter metrics, we could deploy and config follow this guide

chanhz avatar Jul 19 '22 09:07 chanhz

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Oct 19 '22 01:10 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Dec 31 '22 21:12 stale[bot]