tidb-operator icon indicating copy to clipboard operation
tidb-operator copied to clipboard

operator is not able to create pump pods when `spec.pump.resources.request.storage` is missing

Open hoyhbx opened this issue 3 years ago • 0 comments

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: [version.Info]([[http://version.info/){Major](http://version.info/)%7BMajor)](http://version.info/)%7BMajor](http://version.info/)%7BMajor)):"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: [version.Info]([[http://version.info/){GitVersion](http://version.info/)%7BGitVersion)](http://version.info/)%7BGitVersion](http://version.info/)%7BGitVersion)):"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-17T16:54:00Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   [rancher.io/local-path](http://rancher.io/local-path)   Delete          WaitForFirstConsumer   false                  40m

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1   10Gi       RWO            standard       10m
pd-advanced-tidb-pd-1       Bound    pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0   10Gi       RWO            standard       10m
pd-advanced-tidb-pd-2       Bound    pvc-d41691d8-feb5-4e21-b282-1ece1851cffa   10Gi       RWO            standard       10m
tikv-advanced-tidb-tikv-0   Bound    pvc-42e652d6-2400-4ae8-b790-cef8466e4566   100Gi      RWO            standard       10m
tikv-advanced-tidb-tikv-1   Bound    pvc-5af08c43-e02d-433c-896a-b85ad568d1ca   100Gi      RWO            standard       10m
tikv-advanced-tidb-tikv-2   Bound    pvc-652761b6-9fff-4080-b13d-9e364062cddc   100Gi      RWO            standard       10m

What's the status of the TiDB cluster pods?

$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl   1/1     Running   0          41m   10.244.2.2   test-worker3   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          11m   10.244.3.4   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          11m   10.244.2.4   test-worker3   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          11m   10.244.1.4   test-worker2   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          9m   10.244.3.7   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          9m   10.244.2.7   test-worker3   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          9m   10.244.1.7   test-worker2   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          10m   10.244.2.6   test-worker3   <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          10m   10.244.3.6   test-worker    <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          10m   10.244.1.6   test-worker2   <none>           <none>
tidb-controller-manager-6cc68f8949-52vwt   1/1     Running   0          11m   10.244.3.2   test-worker    <none>           <none>
tidb-scheduler-dd569b6b4-hj294             2/2     Running   0          11m   10.244.1.2   test-worker2   <none>           <none>

What did you do?

We deployed a CR file as below with spec.pump.replicas equals to 5. Note that we did not specify the spec.pump.resources.request.storage:

The CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: test-cluster
spec:
  configUpdateStrategy: RollingUpdate
  enableDynamicConfiguration: true
  helper:
    image: busybox:1.34.1
  pd:
    baseImage: pingcap/pd
    config: "[dashboard]\n  internal-proxy = true\n"
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 10Gi
  pump:
    replicas: 5
  pvReclaimPolicy: Retain
  tidb:
    baseImage: pingcap/tidb
    config: "[performance]\n  tcp-keep-alive = true\n"
    maxFailoverCount: 0
    replicas: 3
    service:
      externalTrafficPolicy: Local
      type: NodePort
  tikv:
    baseImage: pingcap/tikv
    config: 'log-level = "info"

      '
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 100Gi
  timezone: UTC
  version: v5.4.0

What did you expect to see? We expected to see the pod for pump being created. In the worst case, we expect the input to be rejected with clear error message if the operator thinks it is invalid.

What did you see instead? The pump pod fails to be created. We checked the events and found rejection because storage resources for pump spec is not specified. However, since the storage is not a required field in the CRD, and the operator does not have a proper validation rule for it, the pod creation silently failed.

hoyhbx avatar Jul 17 '22 22:07 hoyhbx