eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

[Bug] eksctl do not try to delete failed cloudformation stack if cluster creation rolled back.

Open git4example opened this issue 2 years ago • 9 comments

What were you trying to accomplish?

Create EKS Outpost local private cluster, however if due to miss configuration if cluster creation failed and rolled back, eksctl delete command should attempt to delete CFN stack.

What happened?

Included incorrect SG list caused cluster to roll back. "securityGroup": "sg-04f5e2f0a73d291234,sg-0e0c20266c5db5678",

However eksctl delete cluster --region=us-east-1 --name=mycluster-localv4 command tries to look for cluster, it should also try to look for CFN stack and its status to clean any remaining aws resources including CFN.

2023-05-19 03:09:54 [ℹ]  deploying stack "eksctl-mycluster-localv4-cluster"
2023-05-19 03:10:24 [ℹ]  waiting for CloudFormation stack "eksctl-mycluster-localv4-cluster"
2023-05-19 03:10:54 [ℹ]  waiting for CloudFormation stack "eksctl-mycluster-localv4-cluster"
2023-05-19 03:11:54 [ℹ]  waiting for CloudFormation stack "eksctl-mycluster-localv4-cluster"
2023-05-19 03:11:54 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-mycluster-localv4-cluster"
2023-05-19 03:11:54 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2023-05-19 03:11:54 [!]  AWS::EKS::Cluster/ControlPlane: DELETE_IN_PROGRESS
2023-05-19 03:11:54 [▶]  AWS::CloudFormation::Stack/eksctl-mycluster-localv4-cluster: ROLLBACK_IN_PROGRESS – "The following resource(s) failed to create: [ControlPlane]. Rollback requested by user."
2023-05-19 03:11:54 [✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Resource handler returned message: \"Error occured while creating cluster. \nClusterIssue(Code=ConfigurationConflict, Message=user 071249811116 does not own a resource sg-04f5e2f0a73d29883,sg-0e0c20266c5dbd37a, ResourceIds=[subnet-0874b9eeff669c097]) \n\" (RequestToken: 263121e1-409e-fe8d-3a57-0d2b759fc5d0, HandlerErrorCode: NotStabilized)"
2023-05-19 03:11:54 [▶]  AWS::EKS::Cluster/ControlPlane: CREATE_IN_PROGRESS – "Resource creation Initiated"
2023-05-19 03:11:54 [▶]  AWS::EKS::Cluster/ControlPlane: CREATE_IN_PROGRESS
2023-05-19 03:11:54 [▶]  AWS::CloudFormation::Stack/eksctl-mycluster-localv4-cluster: CREATE_IN_PROGRESS – "User Initiated"
2023-05-19 03:11:54 [▶]  failed task: create cluster control plane "mycluster-localv4" (will not run other sequential tasks)
2023-05-19 03:11:54 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2023-05-19 03:11:54 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=mycluster-localv4'
2023-05-19 03:11:54 [✖]  ResourceNotReady: failed waiting for successful resource state
Error: failed to create cluster "mycluster-localv4"
Admin:~/environment $ eksctl delete cluster --region=us-east-1 --name=mycluster-localv4
Error: unable to describe cluster control plane: operation error EKS: DescribeCluster, https response error StatusCode: 404, RequestID: 8cde0ea8-2897-439b-adcb-71d22e9a57e4, ResourceNotFoundException: No cluster found for name: mycluster-localv4.

How to reproduce it?

Used similar config as of https://github.com/weaveworks/eksctl/issues/6619, only difference was included two SG : `"securityGroup": "sg-04f5e2f0a73d291234,sg-0e0c20266c5db5678",`

Logs

Anything else we need to know?

Versions

$ eksctl info
eksctl version: 0.141.0
kubectl version: v1.26.2-eks-a59e1f0
OS: linux

git4example avatar May 19 '23 03:05 git4example

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jun 19 '23 01:06 github-actions[bot]

Still need a fix for this issue.

git4example avatar Jun 20 '23 04:06 git4example

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Oct 23 '23 01:10 github-actions[bot]

Still need a fix for this issue.

git4example avatar Oct 23 '23 07:10 git4example

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 23 '23 01:11 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Nov 28 '23 01:11 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Dec 29 '23 01:12 github-actions[bot]

Still need a fix for this issue.

git4example avatar Dec 29 '23 01:12 git4example

I have the similar issue.

During the cluster creation I have encountered the following error:

2024-01-31 15:13:34 [ℹ]  building cluster stack "eksctl-dev-v2-cluster"
2024-01-31 15:13:35 [ℹ]  deploying stack "eksctl-dev-v2-cluster"
2024-01-31 15:14:05 [ℹ]  waiting for CloudFormation stack "eksctl-dev-v2-cluster"
2024-01-31 15:14:06 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-dev-v2-cluster"
2024-01-31 15:14:06 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-dev-v2-cluster"
2024-01-31 15:14:06 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2024-01-31 15:14:06 [!]  AWS::EC2::InternetGateway/InternetGateway: DELETE_IN_PROGRESS
2024-01-31 15:14:06 [!]  AWS::EC2::VPC/VPC: DELETE_IN_PROGRESS
2024-01-31 15:14:06 [!]  AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS
2024-01-31 15:14:06 [✖]  AWS::EC2::VPC/VPC: CREATE_FAILED – "Resource creation cancelled"
2024-01-31 15:14:06 [✖]  AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
2024-01-31 15:14:06 [✖]  AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "Resource creation cancelled"
2024-01-31 15:14:06 [✖]  AWS::EC2::EIP/NATIPEUCENTRAL1B: CREATE_FAILED – "Resource handler returned message: \"The maximum number of addresses has been reached. (Service: Ec2, Status Code: 400, Request ID: f3dc41c8-e029-4ea7-9044-3b328a787e70)\" (RequestToken: a29a7e2b-ffdd-0f6a-7aa8-742156a421d2, HandlerErrorCode: GeneralServiceException)"
2024-01-31 15:14:06 [✖]  AWS::EC2::EIP/NATIPEUCENTRAL1A: CREATE_FAILED – "Resource handler returned message: \"The maximum number of addresses has been reached. (Service: Ec2, Status Code: 400, Request ID: 4062ba92-1585-4beb-b0f1-da940082f01f)\" (RequestToken: 0244f857-3c7f-23e6-3b12-84486909b70c, HandlerErrorCode: GeneralServiceException)"
2024-01-31 15:14:06 [✖]  AWS::EC2::EIP/NATIPEUCENTRAL1C: CREATE_FAILED – "Resource handler returned message: \"The maximum number of addresses has been reached. (Service: Ec2, Status Code: 400, Request ID: cca30a28-6d2b-4e2e-bdb6-70ee579ab6fb)\" (RequestToken: 8c564867-5347-c32a-4268-d2a1aab46446, HandlerErrorCode: GeneralServiceException)"
2024-01-31 15:14:06 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2024-01-31 15:14:06 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-central-1 --name=dev-v2'
2024-01-31 15:14:06 [✖]  ResourceNotReady: failed waiting for successful resource state
Error: failed to create cluster "dev-v2"

The log shows that the service quota for EC2-VPC Elastic IPs has been exceeded. After increasing this quota, as log suggest, I run the cleanup command eksctl delete cluster --region=eu-central-1 --name=dev-v2 which failed with:

Error: unable to describe cluster control plane: operation error EKS: DescribeCluster, https response error StatusCode: 404, RequestID: a93dec92-c7b2-4e90-8932-e929ec8f1231, ResourceNotFoundException: No cluster found for name: dev-v2.

So I just skipped it and run the create command again. Output:

2024-01-31 15:54:25 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2024-01-31 15:54:25 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-central-1 --name=dev-v2'
2024-01-31 15:54:25 [✖]  creating CloudFormation stack "eksctl-dev-v2-cluster": operation error CloudFormation: CreateStack, https response error StatusCode: 400, RequestID: 83811b48-03d7-4c67-8c31-c63230d6cc4e, AlreadyExistsException: Stack [eksctl-dev-v2-cluster] already exists
Error: failed to create cluster "dev-v2"

The command failed because the CloudFormation stack has already been created (during the previous create command).

CloudFormation should be deleted either by the create cluster (in case of a rollback) or delete cluster commands.

Versions

$ eksctl info
eksctl version: 0.169.0
kubectl version: v1.29.0
OS: darwin

mnow-cd avatar Jan 31 '24 16:01 mnow-cd