[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted"
What were you trying to accomplish?
I'm trying to delete a eksctl managed cluster that contains AWS Application Loadbalancers managed by the aws-lb-controller (https://kubernetes-sigs.github.io/aws-load-balancer-controller).
What happened?
Cluster deletion times out with the error below:
"cmd": [
"eksctl",
"delete",
"cluster",
"--region",
"eu-central-1",
"--name",
"sandbox",
"--wait"
],
}
STDOUT:
2024-02-09 20:02:45 [ℹ] deleting EKS cluster "sandbox"
2024-02-09 20:02:46 [ℹ] will drain 0 unmanaged nodegroup(s) in cluster "sandbox"
2024-02-09 20:02:46 [ℹ] starting parallel draining, max in-flight of 1
2024-02-09 20:02:46 [ℹ] deleted 0 Fargate profile(s)
2024-02-09 20:02:47 [✔] kubeconfig has been updated
2024-02-09 20:02:47 [ℹ] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
STDERR:
Error: deadline surpassed waiting for AWS load balancers to be deleted: k8s-sharedtools-5732128751
How to reproduce it?
-
Deploy a new EKS cluster (I used 1.28) with eksctl >= 0.144.0 and the
vpc-cniaddon -
Provision the aws-lb-controller as described in the docs: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/deploy/installation/
-
Set up an ingress referencing an Application Loadbalancer. In my case, I am using annotations on the Ingress object:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: alb -
wait until the loadbalancer has been successfully created
-
Delete the EKS cluster
eksctl delete cluster --region eu-central-1 --name sandbox --wait
Anything else we need to know?
According to my research, the problem occurs because the AWS VPC CNI (aws-node daemonset) is deleted prior to the deletion of associated Kubernetes services and ingress objects. Deleting the CNI daemonset means that the aws-lb-controller pods fail to process the finalizers for these objects. The objects then get stuck and can not be deleted in Kubernetes.
For me the cluster deletion process is like follows:
- VPC CNI gets deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L95
- Shared resources get deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L105
- Shared resources include AWS LB: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/delete.go#L63
- AWS LB now (since PR: https://github.com/eksctl-io/eksctl/pull/6389) include deletion of AWS LB Controller managed resources: https://github.com/aaroniscode/eksctl/blob/08bd92c91037ca21ec18c04277d9d6ba4d21d704/pkg/elb/cleanup.go#L96C2-L96C18
This issue is happening for me since the upgrade to >= 0.144: https://github.com/eksctl-io/eksctl/releases/tag/v0.144.0 and was probably introduced with: https://github.com/eksctl-io/eksctl/pull/6389
Versions
eksctl info
eksctl version: 0.169.0
kubectl version: v1.24.10
OS: linux
Best regards, Florian.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Try the instructions here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-delete.html
These are the steps to delete an Application Load Balancer:
If you have a CNAME record for your domain that points to your load balancer, point it to a new location and wait for the DNS change to take effect before deleting your load balancer. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ On the navigation pane, under LOAD BALANCING, choose Load Balancers. Select the load balancer, and then choose Actions, Delete. When prompted for confirmation, choose Yes, Delete.
I have got same error but when I deleted load balancer from amazon console directly and then I ran again the command
$ eksctl delete cluster --name
It deleted successfully.
Thanks, that sounds reasonable. For the time being I've implemented a similar routine where I get all Ingress and Service resources from the cluster first, filter them for any that are related to the aws-loadbalancer-controller(*) and then delete the associated Kubernetes resource. Only when they have been successfully deleted I continue to delete the cluster.
(*)
- Services can be of
spec.loadBalancerClass == 'service.k8s.aws/nlb - Ingresses can be of
spec.ingressClassName == 'alb' - or they can have the following annotation
metadata.annotations."kubernetes.io/ingress.class" == 'alb'