javascript Experiencing Intermittent 401 Unauthorized Errors from Kube API Server

Describe the bug

The Kubernetes JavaScript client library is employed in our Node.js application. Recently, experiencing intermittent 401 Unauthorized errors from the Kube API Server.

Error trace:

    body: {
      kind: 'Status',
      apiVersion: 'v1',
      metadata: {},
      status: 'Failure',
      message: 'Unauthorized',
      reason: 'Unauthorized',
      code: 401
    }

Within the Node.js application, the logic involves listing the pods. Most of the time there were no errors observed, but sometimes this 401 error is thrown by the Kubernetes client. This issue began to be noticed following the latest Kubernetes upgrade, which went from version 1.22.1 to 1.24.4.

Initially, the suspicion was directed towards the Kubernetes service account token. This suspicion arose because, starting from Kubernetes version 1.24, the token is no longer mounted as a secret by default; instead, it is mounted inside the container and refreshed by the Kubelet every hour. In contrast, in the 1.22 version, this token was stored as a secret. we print the error stack trace in the app. By decoding the token that is passed as headers to the kube server, it is found that the token was generated a few seconds back and this happens intermittently when a new token is used after the token rotation every 1 hour.

However, upon further analysis, it became evident that this was not the root cause. This is because, for the most part, everything operates smoothly, and the occurrence of the 401 error appears to be sporadic and random.

Need help to find the concrete root cause of why this issue is happening.

Client Version e.g. 0.16.3

Server Version e.g. 1.24.4

Example Code sample code snippet, the error is thrown from line 5

  1 let kubeConfig = new k8s.KubeConfig();
  2 kubeConfig.loadFromDefault();
  3 let kubeApi = kubeConfig.makeApiClient(k8s.CoreV1Api)

  4 let labelSelector = 'app=' + appName;
  5 let res = await kubeApi.listPodForAllNamespaces(false, null, 'status.phase=Running', labelSelector);

Environment (please complete the following information):

OS: Linux
NodeJS Version 12.22.12
Cloud runtime : NA

Sep 04 '23 09:09 Naveen-oops

There's a comment here: https://github.com/kubernetes-client/javascript/blob/master/src/file_auth.ts#L43

We only poll the file for changes every 60 seconds. That means that we likely cache the token across the token refresh and that may be too long.

We should probably use filesytem events to get an event when the file changes.

I suspect that is what is causing your problem, but it's hard to know without logs or more details.

If you wanted to send a PR to update that code to use events we'd be happy to take it.

Sep 05 '23 00:09 brendandburns

We only poll the file for changes every 60 seconds. That means that we likely cache the token across the token refresh and that may be too long.

that seems unlikely... the kubelet refreshes the token at 80% its lifetime, the minimum lifetime is 10 minutes, which means the file should be getting updated with at least 2 minutes remaining on the previous token's lifetime.

Sep 06 '23 16:09 liggitt

Thanks @brendandburns and @liggitt I want to add some more context, we had enabled the debug level and got the token from the response header.

When I tried to decode the token it seemed like the newly created token was only used by the request, but not sure why it was throwing the 401 error. I am attaching the decoded token output for reference, Let me know if anything else is useful for understanding.

{
  "aud": [
    "api",
    "vault",
    "factors"
  ],
  "exp": 1723243859,
  "iat": 1691707859,
  "iss": "api",
  "kubernetes.io": {
    "namespace": "my-namespace",
    "pod": {
      "name": "my-pod-name",
      "uid": "dd69f3de-64ad-4230-bbe5-1ae099f164b6"
    },
    "serviceaccount": {
      "name": "my-pod-service-account",
      "uid": "a77fc43b-933a-414a-b2d7-bd785d54794f"
    },
    "warnafter": 1691711466
  },
  "nbf": 1691707859,
  "sub": "system:serviceaccount:my-namespace:my-pod-service-account"
}

Here, the iat (issued at time): 1691707859 is 10 August 2023 22:50:59 UTC. I am getting this error in my application exactly at 10 August 2023 22:51:02 which I can see from logs. So from my understanding, we are getting this error when a newly created token is used very immediately.

@brendandburns @liggitt can u help out here?

Sep 06 '23 16:09 Naveen-oops

I'm not very familiar with the kubelet token regeneration flow, the client library simply picks the token from the file and sends it as a header.

Given what @liggitt said about the Kubelet regeneration, I agree that the polling interval is unlikely to be the cause here unless your nodes are seriously overloaded.

If it seems like this is due to kubelet/apiserver interactions, then it probably makes more sense to file a bug on the main kubernetes repository.

I wonder if there is clock skew between your node(s) and the API Server?

Sep 07 '23 18:09 brendandburns

I wonder if there is clock skew between your node(s) and the API Server?

getting a 401 response when using a ~brand new token would be more likely to be due to clock skew between API servers, if anything... the node clock isn't in play for validating a brand new token

Sep 07 '23 19:09 liggitt

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 27 '24 21:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 26 '24 22:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 27 '24 23:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 27 '24 23:03 k8s-ci-robot