[EKS] [request]: Add PodDisruptionBudget and safe to evict label to coredns addon
cluster-autoscaler can not scale an (otherwise) empty node down:
Fast evaluation: node ip-192-168-25-147.eu-north-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: coredns-5bf7669654-l46wf
creating a PDB like this:
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: coredns
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
eks.amazonaws.com/component: coredns
fixes this, but then local storage prevents scaledown:
Fast evaluation: node ip-192-168-25-147.eu-north-1.compute.internal cannot be removed: pod with local storage present: coredns-5bf7669654-l46wf
when I add
kubectl annotate pod -n kube-system -l eks.amazonaws.com/component=coredns "cluster-autoscaler.kubernetes.io/safe-to-evict=true"
the node can be scaled down.
Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
If I have scaled up a 96 core machine it will stay on for no reason. Adding these will allow it to shut down.
Are you currently working around this issue? see above
Additional context
See original issue in https://github.com/weaveworks/eksctl/issues/4969
related https://github.com/aws/containers-roadmap/issues/1675
I noticed the same today with the EBS CSI addon (specifically the ebs-csi-controller pod using local storage) and had to run the following:
kubectl annotate pod -n kube-system -l app.kubernetes.io/component=ebs-csi-controller "cluster-autoscaler.kubernetes.io/safe-to-evict=true"
For both of these addons it's very unideal to have to do this. Having the ability to either configure this directly or for them to be marked as safe to evict by default would be really preferrable.
Seems to be linked to https://github.com/aws/containers-roadmap/issues/1028.
Closing as duplicate of #1028