Fix bug with Jobs
Occasionally we are having a bug when we are looping to find a job. Here is the err I am getting
logger.go:130: 2021-05-27T17:00:56.697Z WARN job pod is ready {"action": "Crdb Version Validator"}
logger.go:130: 2021-05-27T17:00:56.782Z WARN completed version checker {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb", "calVersion": "v20.2.8", "containerImage": "cockroachdb/cockroach:v20.2.8"}
logger.go:130: 2021-05-27T17:00:56.782Z INFO request was interrupted {"CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z INFO reconciling CockroachDB cluster {"CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z INFO Running action with index: 0 and name: Decommission {"CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z WARN check decommission oportunities {"action": "decommission", "CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z INFO replicas decommisioning {"action": "decommission", "CrdbCluster": "crdb-test-pxntwh/crdb", "status.CurrentReplicas": 3, "expected": 3}
logger.go:130: 2021-05-27T17:00:56.782Z INFO Running action with index: 1 and name: VersionCheckerAction {"CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z WARN starting to check the crdb version of the container provided {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.782Z WARN User set image.name, using that field instead of cockroachDBVersion {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb"}
logger.go:130: 2021-05-27T17:00:56.794Z ERROR failed to reconcile job only err {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb", "error": "Job.batch \"crdb-vcheck-27035580\" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:\"\", GenerateName:\"\", Namespace:\"\", SelfLink:\"\", UID:\"\", ResourceVersion:\"\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{\"app.kubernetes.io/component\":\"database\", \"app.kubernetes.io/instance\":\"crdb\", \"app.kubernetes.io/name\":\"cockroachdb\", \"controller-uid\":\"a0255182-4d4a-4c98-af46-1cf5eee46a3e\", \"job-name\":\"crdb-vcheck-27035580\"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:\"\", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:\"crdb\", Image:\"cockroachdb/cockroach:v20.2.9\", Command:[]string{\"/bin/bash\"}, Args:[]string{\"-c\", \"/cockroach/cockroach.sh version | grep 'Build Tag:'| awk '{print $3}'; sleep 150\"}, WorkingDir:\"\", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:\"/dev/termination-log\", TerminationMessagePolicy:\"File\", ImagePullPolicy:\"IfNotPresent\", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:\"Never\", TerminationGracePeriodSeconds:(*int64)(0xc0158f7b20), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:\"ClusterFirst\", NodeSelector:map[string]string(nil), ServiceAccountName:\"cockroach-database-sa\", AutomountServiceAccountToken:(*bool)(0xc0158f7b28), NodeName:\"\", SecurityContext:(*core.PodSecurityContext)(0xc01b036500), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:\"\", Subdomain:\"\", SetHostnameAsFQDN:(*bool)(nil), Affinity:(*core.Affinity)(nil), SchedulerName:\"default-scheduler\", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:\"\", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable"}
logger.go:130: 2021-05-27T17:00:56.794Z WARN version checker {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb", "job": "crdb-vcheck-27035580"}
logger.go:130: 2021-05-27T17:00:56.799Z WARN job pod is ready {"action": "Crdb Version Validator"}
logger.go:130: 2021-05-27T17:00:56.883Z WARN completed version checker {"action": "Crdb Version Validator", "CrdbCluster": "crdb-test-pxntwh/crdb", "calVersion": "v20.2.8", "containerImage": "cockroachdb/cockroach:v20.2.8"}
We are recovering, but this will look weird to an end user.
@alinadonisa @keith-mcclellan PTAL
@chrislovecnm the job has already Image:"cockroachdb/cockroach:v20.2.9\ and you are reconciling for version "v20.2.8". What is the scenario that you are running? If you are running in parallel stuff, or reconcile in the same minute period it will generate the same timestamp and the name of the job will be the same on different runs.
It happens occasionally during running our e2e tests.
@davidwding can we close this?