cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

Pod can not start anymore after scale down replica to zero

Open pornpoi opened this issue 4 years ago • 4 comments

I have used Openshift and yester I found the problem about pod can not self-healing after I've for 2 week I have to uninstall cluster and reinstall to make it work again Now I found the step to make it crash by this step

  1. I setup follow by this url : https://github.com/cockroachdb/cockroach-operator/blob/master/openshift.md and it work find
  2. I try to scale down stateful set by change replicas from replicas: 3 to replicas: 0
  3. waiting for all pod terminated
  4. change replicas from replicas: 0 to replicas: 3
  5. then It won't be started anymore.. 6.This logs from container that try to start

crdb-tls-nnn-0-db.log

Cockroach version : v20.1.5

pornpoi avatar Mar 30 '21 14:03 pornpoi

Logs file since pod start

crdb-tls-nnn-0-db (1).log

pornpoi avatar Mar 30 '21 14:03 pornpoi

Hi @pornpoi this actually appears to be a database bug and not an operator bug (but that's not to say we can't do something in the operator to prevent this condition from happening). I was told this may be fixed in the latest DB version (v20.2.7) - could you try that please? We might need to create an issue in github.com/cockroachdb/cockroach as well though

keith-mcclellan avatar Apr 02 '21 17:04 keith-mcclellan

Hi @keith-mcclellan Yes , I can try However I have some problem when I try to use different version as default version. image I found 1 of 3 pod still use default version (20.1.5) Could you tell me about which is the correct practice to change version on operator?

pornpoi avatar Apr 06 '21 08:04 pornpoi

We discussed internally and we think the issue is that the default start pattern is one at a time, rather than all at once. We're looking into changing the default behavior of the SS to allow this to work. In the meantime, you could manually edit the StatefulSet and/or CR to manage the pods in parallel. See https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#parallel-pod-management on how to do this.

keith-mcclellan avatar Apr 09 '21 15:04 keith-mcclellan