[Feature] Upgrade - Allow node drain to be customised or change upgrade process due to PV/zone affinity
Is your feature request related to a problem? Please describe. We use Hashicorp Vault in AKS. We have a situation where Consul needs to be running in a specific availability zone due to PV. We have Pod Disruption Budget that says only 1 Consul can be unavailable. Even setting Pod Disruption Budget and MaxSurge in nodepools it gives downtime.
When upgrading via AKS:
Nodepool Max Surge is 33% or 1 When upgrading with MaxSurge 1, it creates a node in zone 1 but then drains the node in zone 3 and that means consul is not happy because it needs to run on zone 3 but the new node is in zone 1 so it ends up in Pending which is equal to Downtime.
Nodepool Max Surge is 100% or 3 That means AKS will duplicate the nodes which is good. That means each new node will have all AZs and consul can be allocated to node that it makes sense BUT the issue with this scenario is that AKS drains ALL old nodes at once (even with Pod Disruption Budget) and that leads to downtime since Consul takes a while to run on new nodes.
Describe the solution you'd like Implement some mechanism that controls the Draining process. I want temporary nodes to be available in all regions but the draining to be 1 by 1.
OR
Another solution would be to improve the upgrade process with MaxSurge = 1 and create a temporary node and only destroy the old node in the same zone
Describe alternatives you've considered I implemented Max Surge and Pod Disruption budget as below but even with this 2 mechanisms still have issues
Pod Disruption Budget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: consul
namespace: vault
spec:
maxUnavailable: 1
selector:
matchLabels:
app: consul
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads