anthill icon indicating copy to clipboard operation
anthill copied to clipboard

Rolling upgrade of Gluster

Open JohnStrunk opened this issue 7 years ago • 0 comments

Describe the feature you'd like to have. The operator should be able to non-disruptively upgrade a Gluster cluster. When an admin changes the Gluster template, the operator should automatically roll out the change to the entire cluster. As it does so, it must ensure data volumes are available and healed such that restarting a pod does not cause a loss of availability nor split-brain situation.

What is the value to the end user? (why is it a priority?) Gluster upgrades need to be carefully choreographed to maintain cluster health and data availability. Relying on an admin to carry out these steps is both time consuming and error-prone. Implementing this feature ensures that best practices will be followed during the upgrade while also freeing the admin.

How will we know we have a good solution? (acceptance criteria)

  • The operator ensures affected volumes are fully healed before upgrading each pod
  • Admin can specify a desired version to roll out, and the operator will ensure all pods match that version
  • Interacts properly with GD2 to prevent races with auto volume management that could compromise data
  • Tracks and ensures compatibility of client and server versions (CSI as gluster client; Gluster pod as server).
    • Operator will refuse to advance the client version higher than server, in keeping with best practice

Work items

  • [ ] Operator can monitor heal status
  • [ ] Operator can monitor node health
  • [ ] Operator can quiesce a node via state tag
  • [ ] Operator can query and modify the Deployment for a Gluster pod to sync with topology template
  • [ ] Upgrade of CSI driver: #15

Additional context Interacts with:

  • GD2 node state tags
  • Operator deploying CSI: #8

JohnStrunk avatar Jun 26 '18 17:06 JohnStrunk