kafka-operator Handle redeployments

Affected version

kafka-operator 0.5.0 and zk-operator 0.9.0

Current and expected behavior

If i apply, delete and apply the following stackable crds the kafka cluster works for the first apply but not anymore for the second time.

---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  version: 3.8.0
  servers:
    roleGroups:
      default:
        replicas: 3
        config: {}
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-kafka-znode
spec:
  clusterRef:
    name: simple-zk
    namespace: default
---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
spec:
  version: 3.1.0
  zookeeperConfigMapName: simple-kafka-znode
  brokers:
    roleGroups:
      default:
        replicas: 3

In the logs i can find the following error message:

[2022-04-26 14:21:18,645] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID dOMtDqQ_QU6rqOpOeyosIA doesn't match stored clusterId Some(7tfltX7ATz-aIURko5dtnQ) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
	at kafka.server.KafkaServer.startup(KafkaServer.scala:228)
	at kafka.Kafka$.main(Kafka.scala:109)
	at kafka.Kafka.main(Kafka.scala)

Problem is that the ZKCluster ID gets saved in the first apply. Since the volumes are persistent over kubectl delete -f kafka.yaml and the ZKCluster generates a new ID on the redeploy the Kafka cluster is stuck.

Possible solution

I am wondering why ZK gets a new ClusterID on every restart. Shouldnt the ID be fixed since the data inside the Cluster doesn't change (persistent volumes)? If the id change is inevitable the Kafka cluster should tolerate the ID change of the ZKCluster.

Additional context

No response

Environment

Client Version: v1.23.6 Server Version: v1.22.6

Would you like to work on fixing this bug?

yes

Apr 26 '22 14:04 HannesHil

Hi @HannesHil ,

sorry this took so long, we missed this issue somehow! I just took a brief look and can at least partially explain what happens and offer a workaround for now.

I think there may be a bit of a misunderstanding that I'd like to clear up first though. You mention that you apply and delete and reapply the CRD from your issue to k8s, and then you write the follwing:

I am wondering why ZK gets a new ClusterID on every restart.

What you did by deleting the CRDs is not a restart of ZooKeeper though, what that triggers is a complete removal of all these services and then you deploy two new services that are completely unrelated to the first two, they just "happen" to have the same name. Due to the name being the same they end up getting assigned the same PVs. Ideally we'd remove the PVs, but in the case of an accidental deletion this would effectively delete all data stored in the product, so we are a bit hesitant about that. I'll admit that we may need a better story and at least better docs around this though.

This hopefully sort of explains why the id changes - it is a new cluster.

I'm not sure what you were trying to achieve by deleting the CRDs, if you just want to restart the products, triggering a restart on the statefulsets (rolling) or just deleting the pods(full) should do the trick.

At least for me this performs a restart of everything that comes back up:

kubectl delete pod simple-kafka-broker-default-0 \
  simple-kafka-broker-default-1 \
  simple-kafka-broker-default-2 \
  simple-zk-server-default-0 \
  simple-zk-server-default-1 \
  simple-zk-server-default-2

Or rolling: kubectl rollout restart statefulset simple-kafka-broker-default simple-zk-server-default

Another note on the znode object that you are using, what this object does is that it requests a random folder in zookeeper which your product can than work in, so by recreating that object you actually change the name of the folder in ZK, so Kafka would have no chance of finding its metadata again after you redeploy that object.

I know I have not really answered you question here, just said "you are using it wrong" and I am sorry about that :)

I'll leave this as is for now and wait for a response from you, happy to explain more and absolutely not saying that we don't need to change something here :)

May 16 '22 08:05 soenkeliebau

@HannesHil @soenkeliebau can this be closed?

Oct 03 '22 08:10 maltesander