clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Reconciling hangs when function "updateCHIObjectStatus()" catches an error.

Open R-omk opened this issue 3 years ago • 1 comments

context:

clickhouse-operator. Version:0.18.3 GitSHA:76f6a6a BuiltAt:2022-03-04T16:39:42

minikube: version: v1.25.2 --driver=docker --kubernetes-version=v1.22.5 --container-runtime=containerd

tail of logs:

I0318 12:06:25.786309       1 creator.go:32] createStatefulSet()
I0318 12:06:25.786324       1 creator.go:39] Create StatefulSet project1-test/chi-xxx-clickhouse-yyy-s0r1
I0318 12:06:25.859379       1 pods.go:65] deleteLabelReady():FAIL get pod for host project1-test/s0r1 err:pods "chi-xxx-clickhouse-yyy-s0r1-0" not found
I0318 12:06:30.917635       1 poller.go:220] pollStatefulSet():project1-test/chi-xxx-clickhouse-yyy-s0r1:OK  :ObservedGeneration:1 Replicas:1 ReadyReplicas:0 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:chi-xxx-clickhouse-yyy-s0r1-894b475d UpdateRevision:chi-xxx-clickhouse-yyy-s0r1-894b475d
E0318 12:07:19.885035       1 poller.go:244] pollStatefulSet():project1-test/chi-xxx-clickhouse-yyy-s0r1:project1-test/chi-xxx-clickhouse-yyy-s0r1 Get() FAILED
I0318 12:07:19.885175       1 creator.go:164] onStatefulSetCreateFailed():going to ignore error project1-test/chi-xxx-clickhouse-yyy-s0r1
E0318 12:07:33.889439       1 controller.go:706] updateCHIObjectStatus():project1-test/xxx-clickhouse/85dfb9bc-b67f-4387-a74a-dd5501637639:"etcdserver: request timed out"
W0318 12:07:33.889540       1 worker.go:1912] createStatefulSet():Create StatefulSet project1-test/chi-xxx-clickhouse-yyy-s0r1 - error ignored
E0318 12:07:37.074836       1 worker.go:1873] reconcileStatefulSet():FAILED to reconcile StatefulSet: chi-xxx-clickhouse-yyy-s0r1 CHI: xxx-clickhouse 
E0318 12:07:37.128082       1 worker.go:315] reconcileCHI():project1-test/xxx-clickhouse/85dfb9bc-b67f-4387-a74a-dd5501637639:FAILED update: onStatefulSetCreateFailed - ignore



This happens when the system slows down when starting the project. In this example, the operator was able to create two out of four statefulSets.

After restarting the operator, it was able to finish the entire process, however, until version 0.18.3, restarting did not correct the situation.

R-omk avatar Mar 18 '22 12:03 R-omk

@R-omk , thank you for the report. We will take a look in scope of 0.19 release

alex-zaitsev avatar Apr 14 '22 09:04 alex-zaitsev

Should be fixed while ago

alex-zaitsev avatar Mar 01 '24 06:03 alex-zaitsev