java-operator-sdk icon indicating copy to clipboard operation
java-operator-sdk copied to clipboard

Operator throwing error in an endless loop "too old resource version"

Open fhalde opened this issue 1 year ago • 10 comments

Bug Report

What did you do?

We are not sure of the events that led to this. It started occurring suddenly. A restart has fixed it though but the operator was non-functional by this time i.e. it was not reconciling anything

What did you expect to see?

No errors

What did you see instead? Under which circumstances?

Our operator is throwing the following in a endless loop

2024-04-19 08:59:46,858 i.f.k.c.d.i.AbstractWatchManager [ERROR] Received an error which is not a status but {"type":"ERROR","object":{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"too old resource version: 31159423 (31160199)","reason":"Expired","code":410}} - will retry

Environment

Kubernetes cluster type: EKS

$ Mention java-operator-sdk version from pom.xml file

4.8.2

$ java -version

openjdk version "21.0.2" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed mode, sharing)

$ kubectl version

Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1-eks-b9c9ed7

Possible Solution

Additional context

Unfortunately no, this error has no logs prior to it and it just started occuring out of the blue. We are using 6.10.0 fabric8 client

fhalde avatar Apr 19 '24 09:04 fhalde

actually, fabric8 6.11.0 is in the dependency tree

fhalde avatar Apr 19 '24 09:04 fhalde

This seems to be an issue with the watches in fabric8 client. cc @manusa @shawkins

csviri avatar Apr 19 '24 09:04 csviri

Classloading issues make this logic subseptiable to this problem - https://github.com/fabric8io/kubernetes-client/issues/5692

We could consider making the deserialization here to just generic instead, but more than likely the user will want to fix having more than one definition of Status in the classpath.

shawkins avatar Apr 19 '24 11:04 shawkins

Hi @shawkins

but more than likely the user will want to fix having more than one definition of Status in the classpath

I'm not sure what is this Status you are referring to. Are you saying I look at my mvn dependency:tree?

fhalde avatar Apr 19 '24 13:04 fhalde

this is how the deps look like

[INFO] +- io.javaoperatorsdk:operator-framework:jar:4.8.2:compile
[INFO] |  +- io.javaoperatorsdk:operator-framework-core:jar:4.8.2:compile
[INFO] |  |  \- io.fabric8:kubernetes-client:jar:6.11.0:compile
.
.
.
.
[INFO] +- io.strimzi:api:jar:0.40.0:compile
[INFO] |  +- io.fabric8:kubernetes-client-api:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-gatewayapi:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-resource:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-rbac:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-admissionregistration:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-apps:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-autoscaling:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-batch:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-certificates:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-coordination:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-discovery:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-events:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-extensions:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-flowcontrol:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-metrics:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-policy:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-scheduling:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-storageclass:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-node:jar:6.10.0:compile
[INFO] |  |  +- org.snakeyaml:snakeyaml-engine:jar:2.7:compile
[INFO] |  |  \- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.16.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-core:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-networking:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-common:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-apiextensions:jar:6.10.0:compile

fhalde avatar Apr 19 '24 14:04 fhalde

is it better to keep the fabric8 version consistent?

fhalde avatar Apr 19 '24 14:04 fhalde

@fhalde yes it is especially if you don't have a flat classloader and end up with two different Status class definitions accessible from different classloaders.

shawkins avatar Apr 19 '24 20:04 shawkins

hmm, we definitely don't make use of any classloaders. is this some fabric8 internals? anyway here is what my fat jar contents look like

jar -tvf operator.jar | grep '/Status.class'
 io/javaoperatorsdk/operator/health/Status.class
 io/strimzi/api/kafka/model/kafka/Status.class
 io/fabric8/kubernetes/api/model/Status.class
 org/apache/logging/log4j/core/util/internal/Status.class
 ch/qos/logback/core/status/Status.class

@shawkins

fhalde avatar Apr 20 '24 10:04 fhalde

Can you try to make sure that the fabric8 client version that gets put into your fat jar is the same version as the one used by JOSDK?

metacosm avatar Apr 26 '24 08:04 metacosm

Hi @metacosm , we were running our operator with a single version of fabric8 for a few days and today this error came up once again

here is what i could gather by attaching a debugger. the status message was unmarshalled into a GenericKubernetesResource class rather than Status. Weirdly the error stopped after a while after I attached a remote debugger

If this comes up once again i'll let you know.

fhalde avatar Apr 29 '24 12:04 fhalde

will close this issue, pls let us know if that happens again.

csviri avatar May 27 '24 14:05 csviri