operator is not able to create pump pods when `spec.pump.resources.request.storage` is missing
Bug Report
What version of Kubernetes are you using?
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: [version.Info]([[http://version.info/){Major](http://version.info/)%7BMajor)](http://version.info/)%7BMajor](http://version.info/)%7BMajor)):"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
What version of TiDB Operator are you using?
TiDB Operator Version: [version.Info]([[http://version.info/){GitVersion](http://version.info/)%7BGitVersion)](http://version.info/)%7BGitVersion](http://version.info/)%7BGitVersion)):"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-17T16:54:00Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) [rancher.io/local-path](http://rancher.io/local-path) Delete WaitForFirstConsumer false 40m
$ kubectl get pvc -n {tidb-cluster-namespace}
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pd-advanced-tidb-pd-0 Bound pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1 10Gi RWO standard 10m
pd-advanced-tidb-pd-1 Bound pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0 10Gi RWO standard 10m
pd-advanced-tidb-pd-2 Bound pvc-d41691d8-feb5-4e21-b282-1ece1851cffa 10Gi RWO standard 10m
tikv-advanced-tidb-tikv-0 Bound pvc-42e652d6-2400-4ae8-b790-cef8466e4566 100Gi RWO standard 10m
tikv-advanced-tidb-tikv-1 Bound pvc-5af08c43-e02d-433c-896a-b85ad568d1ca 100Gi RWO standard 10m
tikv-advanced-tidb-tikv-2 Bound pvc-652761b6-9fff-4080-b13d-9e364062cddc 100Gi RWO standard 10m
What's the status of the TiDB cluster pods?
$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl 1/1 Running 0 41m 10.244.2.2 test-worker3 <none> <none>
advanced-tidb-pd-0 1/1 Running 0 11m 10.244.3.4 test-worker <none> <none>
advanced-tidb-pd-1 1/1 Running 0 11m 10.244.2.4 test-worker3 <none> <none>
advanced-tidb-pd-2 1/1 Running 0 11m 10.244.1.4 test-worker2 <none> <none>
advanced-tidb-tidb-0 2/2 Running 0 9m 10.244.3.7 test-worker <none> <none>
advanced-tidb-tidb-1 2/2 Running 0 9m 10.244.2.7 test-worker3 <none> <none>
advanced-tidb-tidb-2 2/2 Running 0 9m 10.244.1.7 test-worker2 <none> <none>
advanced-tidb-tikv-0 1/1 Running 0 10m 10.244.2.6 test-worker3 <none> <none>
advanced-tidb-tikv-1 1/1 Running 0 10m 10.244.3.6 test-worker <none> <none>
advanced-tidb-tikv-2 1/1 Running 0 10m 10.244.1.6 test-worker2 <none> <none>
tidb-controller-manager-6cc68f8949-52vwt 1/1 Running 0 11m 10.244.3.2 test-worker <none> <none>
tidb-scheduler-dd569b6b4-hj294 2/2 Running 0 11m 10.244.1.2 test-worker2 <none> <none>
What did you do?
We deployed a CR file as below with spec.pump.replicas equals to 5. Note that we did not specify the spec.pump.resources.request.storage:
The CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: test-cluster
spec:
configUpdateStrategy: RollingUpdate
enableDynamicConfiguration: true
helper:
image: busybox:1.34.1
pd:
baseImage: pingcap/pd
config: "[dashboard]\n internal-proxy = true\n"
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 10Gi
pump:
replicas: 5
pvReclaimPolicy: Retain
tidb:
baseImage: pingcap/tidb
config: "[performance]\n tcp-keep-alive = true\n"
maxFailoverCount: 0
replicas: 3
service:
externalTrafficPolicy: Local
type: NodePort
tikv:
baseImage: pingcap/tikv
config: 'log-level = "info"
'
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 100Gi
timezone: UTC
version: v5.4.0
What did you expect to see? We expected to see the pod for pump being created. In the worst case, we expect the input to be rejected with clear error message if the operator thinks it is invalid.
What did you see instead?
The pump pod fails to be created. We checked the events and found rejection because storage resources for pump spec is not specified. However, since the storage is not a required field in the CRD, and the operator does not have a proper validation rule for it, the pod creation silently failed.