javascript False update events from informer after upgrading from 0.22 to 1.2.0 on AKS clusters

Describe the bug

We noticed that after migrating from version 0.22 to 1.2.0 (and also in 1.1.2), we began receiving false update events from the informer event listener. The object hasn't changed in etcd, yet these events appear to occur approximately every 40 minutes. This behavior did not occur in version 0.22—we verified this through the logs in Datadog.

datadog logs

To further confirm the issue, we ran the following command:

kubectl get githubmembership juanjosevazquezgil-68512dbc-257d-48f1-a53a-055f94b8ab7f -n firestartr-github --show-managed-fields=true -o yaml --output-watch-events=true --watch

During the same time period, no changes to the object were shown.

We're deploying the same version of our service in an Amazon EKS cluster, where this behavior does not occur.

Additionally, we use a Kind cluster for local development, and we haven’t observed this issue there either.

Client Version 1.2.0

Server Version 1.30.11

Example Code

import { getConnection } from './ctl';
import Debug from 'debug';
import common from 'catalog_common';
const log = Debug('firestartr:operator:reflector');
import * as k8s from '@kubernetes/client-node';
import { Store } from './store';

/**
 * Execute the callbacks for each item in the store
 * @param {string} plural - Kind to observe
 * @param {string} namespace - Namespace to observe
 * @param {Function} onAdd - Function to call when an item is added
 * @param {Function} onChange - Function to call when an item is modified
 * @param {Function} onDelete - Function to call when an item is deleted
 * @param {Function} onRename - Function to call when an item is renamed
 */
export async function observe(
  plural: string,
  namespace: string,
  onAdd: Function,
  onChange: Function,
  onDelete: Function,
  _onRename: Function,
) {
  const store = new Store(plural);

  const { kc } = await getConnection();

  try {
    const k8sApi = kc.makeApiClient(k8s.CustomObjectsApi);

    const apiGroup = 'firestartr.dev';

    const apiVersion = 'v1';

    const apiPaths = `/apis/${apiGroup}/${apiVersion}/namespaces/${namespace}/${plural}`;

    const listFn = () => {
      return k8sApi.listNamespacedCustomObject({
        group: apiGroup,
        version: apiVersion,
        namespace,
        plural,
      });
    };

    const informer = k8s.makeInformer(
      kc,
      apiPaths,
      listFn as k8s.ListPromise<k8s.KubernetesObject>,
    );

    informer.on(
      'add',

      (obj: any) => {
        store.add(obj);

        if (store.hasDeletionTimestamp(obj)) {
          log(
            `informer (on add): The item ${obj.metadata.name} has a deletion timestamp, marking for deletion`,
          );

          store.markToDelete(obj);

          onDelete(obj);
        } else {
          log(
            `informer (on add): The item ${obj.metadata.name} has been added`,
          );

          onAdd(obj);
        }
      },
    );

    informer.on(
      'update',

      (obj: any) => {
        log(
          `informer (on update): The item ${obj.kind}: ${obj.metadata.name} has been updated`,
        );

        if (
          !store.getItem(obj).markedToDelete &&
          store.hasDeletionTimestamp(obj) &&
          (store.hasBeenMarkedToDelete(obj) || store.modified(obj))
        ) {
          log(
            `informer (on update): The item ${obj.kind}: ${obj.metadata.name} has a deletion timestamp, marking for deletion`,
          );

          store.markToDelete(obj);

          onDelete(obj);
        } else if (store.modified(obj)) {
          log(
            `informer (on update): The item ${obj.kind}: ${obj.metadata.name} has been modified`,
          );

          onChange(obj);
        }
      },
    );

    informer.on(
      'delete',

      (obj: any) => {
        log(
          `informer (on delete): The item ${obj.metadata.name} :${obj.metadata.name} has been deleted from etcd`,
        );

        store.remove(obj);
      },
    );

    informer.on(
      'error',

      (err: any) => {
        log('informer (on error): ERROR %O', err);

        setTimeout(
          () => {
            void informer.start();
          },

          5000,
        );
      },
    );

    void informer.start();
  } catch (err) {
    console.error(`Observing: ${plural}: ${err}`);

    common.io.writeFunctionLog(
      'observer',
      `Error Observing: ${plural}: ${err}`,
    );

    throw `Observing: ${plural}: ${err}`;
  }
}

Environment (please complete the following information):

OS: [Linux]
Node.js version [20]
Cloud runtime [Azure Aks v1.30.11]

May 07 '25 16:05 tonirvega

Not sure if it can be related anyhow to this commit. Don't seem to be much changes going on there.

May 07 '25 16:05 jalvarezit

So my guess is that this is happening on re-list when the connection drops and there is a reconnect. After we list, we do test to make sure we haven't already seen the same resourceVersion:

https://github.com/kubernetes-client/javascript/blob/main/src/cache.ts#L303

It would be interesting if you could log the resourceVersion for the updated resource and see if it has in fact changed, or if it is the same.

May 07 '25 16:05 brendandburns

So my guess is that this is happening on re-list when the connection drops and there is a reconnect. After we list, we do test to make sure we haven't already seen the same resourceVersion:

https://github.com/kubernetes-client/javascript/blob/main/src/cache.ts#L303

It would be interesting if you could log the resourceVersion for the updated resource and see if it has in fact changed, or if it is the same.

I logged the source version , it is triggering a set of events with previous versions of the object

i will log more details if you need

May 07 '25 18:05 tonirvega

So, from looking at the code I'm really not sure how this could happen unless the entire informer is being recreated somehow. I wonder if the onError handler is being called multiple times which would then create multiple informers.

We always call addOrUpdateObject when we see a resource, and that always checks !isSameVersion

https://github.com/kubernetes-client/javascript/blob/main/src/cache.ts#L204

And that code checks the resource version for equality: https://github.com/kubernetes-client/javascript/blob/main/src/cache.ts#L30

May 08 '25 15:05 brendandburns

Sorry for intruding, I'm a @tonirvega workmate.

We don't see this behavior with 0.22.3, and we've been using it on AKS for the past few months until we refactored for ESM.

Maybe it has nothing to do with it, but reviewing the diff regarding the cache.ts file, between 1.1.2 and 0.22.3, I missed some commits that were in 0.2.3 and that I don't see in 1.1.2 (maybe now they don't apply?).

Specially https://github.com/kubernetes-client/javascript/commit/d12ec2629a61da2e1cbde81113021c4bb38bfb25, that comes from https://github.com/kubernetes-client/javascript/pull/781, but also https://github.com/kubernetes-client/javascript/commit/eb4d246a93f373d51b109bc122120514e5bd6c93, from https://github.com/kubernetes-client/javascript/pull/965.

May 08 '25 18:05 alambike

@alambike thanks for the pointers. I will try cherry-picking those forward into the 1.x series and we can test.

May 09 '25 14:05 brendandburns

So I think I figured out part of the problem. On relist, we reset the resource version to the resource version in the list:

https://github.com/kubernetes-client/javascript/blob/main/src/cache.ts#L165

But when I tested against a real server, that resource version is always empty, so we probably shouldn't do that. I still don't see how you get duplicate events, but I'm working on improving the tests.

May 09 '25 16:05 brendandburns

I can confirm this is not happening in the 0.22.3, no duplicated events for the same resource version

May 26 '25 07:05 tonirvega

Just to add a note that we're also experiencing this, and from what I can see it does seem that when the informer disconnects and reconnects again, we're receiving multiple older versions of the resource that our informer is then handling. Did not have this issue prior to upgrading to v1

Jun 13 '25 07:06 AlexC

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 11 '25 08:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 11 '25 09:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Nov 10 '25 10:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Nov 10 '25 10:11 k8s-ci-robot

Just to add a note that we're also experiencing this, and from what I can see it does seem that when the informer disconnects and reconnects again, we're receiving multiple older versions of the resource that our informer is then handling. Did not have this issue prior to upgrading to v1

No, in previous 0.22.x version we do not experience this issue. We still need to validate this again in the latest v1 versions, as it seems that changes have been introduced to the watch feature since this version 1.2.0.

Nov 10 '25 11:11 alambike