[BUG] Metrics Server periodically returning service unavailable
Describe the bug Numerous events stating:
failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io) source: component: horizontal-pod-autoscaler
To Reproduce Steps to reproduce the behavior:
- Install AKS cluster 1.30.3 with Istio enabled, API Server VNET + Node Autoprovision
Events start spamming event log. Metrics server is running correctly, however kubectl top pod fails sporadically with
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Metrics server appears to be running quite hot for a small cluster with < 10 nodes (using default AKS settings)
metrics-server-7dddddfd7d-22ddd 148m 88Mi metrics-server-7dddddfd7d-5wkcv 148m 72Mi
Expected behavior Metrics server doesn't periodically fail
Screenshots
If applicable, add screenshots to help
explain your problem.
Environment (please complete the following information):
- Kubernetes version 1.30.3
Additional context
kubectl logs -n kube-system deployment/metrics-server
Found 2 pods, using pod/metrics-server-7dddddfd7d-22ddd
Defaulted container "metrics-server-vpa" out of: metrics-server-vpa, metrics-server
I0906 22:40:45.124467 1 pod_nanny.go:86] Invoked by [/pod_nanny --config-dir=/etc/config --cpu=150m --extra-cpu=0.5m --memory=100Mi --extra-memory=4Mi --poll-period=180000 --threshold=5 --deployment=metrics-server --container=metrics-server]
I0906 22:40:45.124570 1 pod_nanny.go:87] Version: 1.8.22
I0906 22:40:45.124594 1 pod_nanny.go:109] Watching namespace: kube-system, pod: metrics-server-7dddddfd7d-22ddd, container: metrics-server.
I0906 22:40:45.124600 1 pod_nanny.go:110] storage: MISSING, extra_storage: 0Gi
I0906 22:40:45.125127 1 pod_nanny.go:214] Failed to read data from config file "/etc/config/NannyConfiguration": open /etc/config/NannyConfiguration: no such file or directory, using default parameters
I0906 22:40:45.125149 1 pod_nanny.go:144] cpu: 150m, extra_cpu: 0.5m, memory: 100Mi, extra_memory: 4Mi
I0906 22:40:45.125159 1 pod_nanny.go:278] Resources: [{Base:{i:{value:150 scale:-3} d:{Dec:
I was attempting to mitigate by adding more resources to metrics-server using https://learn.microsoft.com/en-us/azure/aks/use-metrics-server-vertical-pod-autoscaler#manually-configure-metrics-server-resource-usage. Doesn't seem to help - approximately 1 out of 4 requests fail with:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Maybe there is an incompatibility with :v0.6.3 of metrics-server that AKS uses?
@xiazhan we looked at this from the service-mesh side and do not think this is an issue with Istio. Could you investigate on metrics server side?
I've got the same issue. Any advice on that?
Same there on AKS - 47 nodes running and HPA sometimes have difficulties to join metrics-server
This issue still occurs
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. Please review @xiazhan.
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. Please review @xiazhan, @kthakar1990, @stl327, @huizhifan.
This issue will now be closed because it hasn't had any activity for 7 days after stale. @vikas-rajvanshy feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.