containers-roadmap [EKS] [request]: Add PodDisruptionBudget and safe to evict label to coredns addon

cluster-autoscaler can not scale an (otherwise) empty node down:

Fast evaluation: node ip-192-168-25-147.eu-north-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: coredns-5bf7669654-l46wf

creating a PDB like this:

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: coredns
  namespace: kube-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      eks.amazonaws.com/component: coredns

fixes this, but then local storage prevents scaledown:

Fast evaluation: node ip-192-168-25-147.eu-north-1.compute.internal cannot be removed: pod with local storage present: coredns-5bf7669654-l46wf

when I add

kubectl annotate pod -n kube-system -l eks.amazonaws.com/component=coredns "cluster-autoscaler.kubernetes.io/safe-to-evict=true"

the node can be scaled down.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

If I have scaled up a 96 core machine it will stay on for no reason. Adding these will allow it to shut down.

Are you currently working around this issue? see above

Additional context

See original issue in https://github.com/weaveworks/eksctl/issues/4969

Mar 18 '22 14:03 matti

related https://github.com/aws/containers-roadmap/issues/1675

Mar 19 '22 01:03 jonathan-conder-sm

I noticed the same today with the EBS CSI addon (specifically the ebs-csi-controller pod using local storage) and had to run the following:

kubectl annotate pod -n kube-system -l app.kubernetes.io/component=ebs-csi-controller "cluster-autoscaler.kubernetes.io/safe-to-evict=true"

For both of these addons it's very unideal to have to do this. Having the ability to either configure this directly or for them to be marked as safe to evict by default would be really preferrable.

Sep 12 '22 14:09 gygitlab

Seems to be linked to https://github.com/aws/containers-roadmap/issues/1028.

Nov 17 '22 21:11 yann-soubeyrand

Closing as duplicate of #1028

Jan 10 '23 05:01 mikestef9