clickhouse-operator
clickhouse-operator copied to clipboard
Reconciling hangs when function "updateCHIObjectStatus()" catches an error.
context:
clickhouse-operator. Version:0.18.3 GitSHA:76f6a6a BuiltAt:2022-03-04T16:39:42
minikube: version: v1.25.2 --driver=docker --kubernetes-version=v1.22.5 --container-runtime=containerd
tail of logs:
I0318 12:06:25.786309 1 creator.go:32] createStatefulSet()
I0318 12:06:25.786324 1 creator.go:39] Create StatefulSet project1-test/chi-xxx-clickhouse-yyy-s0r1
I0318 12:06:25.859379 1 pods.go:65] deleteLabelReady():FAIL get pod for host project1-test/s0r1 err:pods "chi-xxx-clickhouse-yyy-s0r1-0" not found
I0318 12:06:30.917635 1 poller.go:220] pollStatefulSet():project1-test/chi-xxx-clickhouse-yyy-s0r1:OK :ObservedGeneration:1 Replicas:1 ReadyReplicas:0 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:chi-xxx-clickhouse-yyy-s0r1-894b475d UpdateRevision:chi-xxx-clickhouse-yyy-s0r1-894b475d
E0318 12:07:19.885035 1 poller.go:244] pollStatefulSet():project1-test/chi-xxx-clickhouse-yyy-s0r1:project1-test/chi-xxx-clickhouse-yyy-s0r1 Get() FAILED
I0318 12:07:19.885175 1 creator.go:164] onStatefulSetCreateFailed():going to ignore error project1-test/chi-xxx-clickhouse-yyy-s0r1
E0318 12:07:33.889439 1 controller.go:706] updateCHIObjectStatus():project1-test/xxx-clickhouse/85dfb9bc-b67f-4387-a74a-dd5501637639:"etcdserver: request timed out"
W0318 12:07:33.889540 1 worker.go:1912] createStatefulSet():Create StatefulSet project1-test/chi-xxx-clickhouse-yyy-s0r1 - error ignored
E0318 12:07:37.074836 1 worker.go:1873] reconcileStatefulSet():FAILED to reconcile StatefulSet: chi-xxx-clickhouse-yyy-s0r1 CHI: xxx-clickhouse
E0318 12:07:37.128082 1 worker.go:315] reconcileCHI():project1-test/xxx-clickhouse/85dfb9bc-b67f-4387-a74a-dd5501637639:FAILED update: onStatefulSetCreateFailed - ignore
This happens when the system slows down when starting the project. In this example, the operator was able to create two out of four statefulSets.
After restarting the operator, it was able to finish the entire process, however, until version 0.18.3, restarting did not correct the situation.
@R-omk , thank you for the report. We will take a look in scope of 0.19 release
Should be fixed while ago