eksctl [Bug] Unauthorized error when deleting a cluster

What were you trying to accomplish?

Delete a cluster which has iam service accounts. eksctl delete cluster -name my-cluster --wait

What happened?

I get an error when eksctl try to determine if the corresponding service account exists in the cluster.

2022-08-26 15:37:27 [✖]  checking whether serviceaccount "kube-system/aws-node" exists: Unauthorized

The next run of eksctl delete cluster clean the cluster successfully.

Audit log shows that eksctl try to get the service account using an anonymous user (at least it is not mapped to any user/group):

{
[...]
    "requestURI": "/api/v1/namespaces/kube-system/serviceaccounts/aws-node",
    "verb": "get",
    "user": {},
    "objectRef": {
        "resource": "serviceaccounts",
        "namespace": "kube-system",
        "name": "aws-node",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "status": "Failure",
        "reason": "Unauthorized",
        "code": 401
    },
[...]
}

(It can happens with any service account, the first to be checks raise the error)

I have no issue with eks delete iamserviceaccount -f config.yaml --include '*/*' -w --approve.

How to reproduce it?

Create a cluster with iam service accounts. Delete the cluster. (It is reproductible 100% of the time in our environement, not sure if that happens with a simpler one)

Logs

https://gist.github.com/vflaux/24e3aac2aefdaaa638764e07c8dc3f79

Anything else we need to know?

Versions

$ eksctl info
eksctl version: 0.109.0
kubectl version: v1.23.8
OS: linux

Aug 29 '22 17:08 vflaux

for me it goes like this

2022-09-22 13:31:33 [ℹ]  deleting EKS cluster "..."
2022-09-22 13:31:35 [ℹ]  will drain 1 unmanaged nodegroup(s) in cluster "..."
2022-09-22 13:31:35 [ℹ]  starting parallel draining, max in-flight of 1
2022-09-22 13:31:35 [ℹ]  cordon node "ip-....us-east-2.compute.internal"
2022-09-22 13:32:39 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:33:41 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:34:44 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:35:46 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:36:49 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:37:52 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:38:55 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:39:57 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:41:00 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:42:02 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:43:05 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:44:07 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:45:10 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:46:12 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:47:15 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:48:18 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:49:20 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:50:23 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:51:26 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:52:28 [!]  1 pods are unevictable from node ip-....us-east-2.compute.internal
2022-09-22 13:52:59 [!]  pod eviction error ("errs: [Unauthorized]") on node ip-....us-east-2.compute.internal
2022-09-22 13:53:04 [✖]  Node group drain failed: %!w(*errors.errorString=&{errs: [Unauthorized]})
Error: errs: [Unauthorized]

Sep 22 '22 21:09 namgk

this came after I created the cluster myself so I assume I'm the owner of the cluster. Why can't the owner delete the cluster?

Sep 22 '22 21:09 namgk

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Oct 23 '22 02:10 github-actions[bot]

I still encounter this error each time I delete a cluster.

Oct 26 '22 10:10 vflaux

I'm running into the same bug in a different path (eksctl delete nodegroup, with parallelism of 15). It happens exactly at 20 mins, and I wonder if it's just a TTL in the kubeconfig?

Fwiw, smaller nodegroups will delete just fine.

Nov 12 '22 02:11 praneshpandurangan-at

I'm running into the same bug in a different path (eksctl delete nodegroup, with parallelism of 15). It happens exactly at 20 mins, and I wonder if it's just a TTL in the kubeconfig?

Fwiw, smaller nodegroups will delete just fine.

@praneshpandurangan-at, @vflaux, what version of eksctl are you using? A fix was out in 0.116 that should address this issue as well.

Nov 14 '22 09:11 cPu1

I was using 0.112. I just tested with 0.118 and this issue is gone. Fixed by https://github.com/weaveworks/eksctl/pull/5772 I assume. Thanks @cPu1.

Nov 14 '22 12:11 vflaux

I was using 0.112. I just tested with 0.118 and this issue is gone.

Great. Thanks for updating us, @vflaux.

Fixed by #5772 I assume, Thanks @cPu1.

Correct!

Nov 14 '22 13:11 cPu1