bug: vmcluster parallel rollout of nodes
vmoperator v0.65.0
I just applied change to upgrade the tags for all 3 nodes and also added the GOMAXPROCS to extraEnvs. Instead of waiting for each node to complete their rollout, all 3 were updated in parallel.
Example VMCluster CR:
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
name: canary
namespace: monitoring
spec:
replicationFactor: 2
retentionPeriod: "3"
vminsert:
extraEnvs:
- name: GOMAXPROCS
value: "2"
hpa:
maxReplicas: 12
metrics:
- pods:
metric:
name: vm_concurrent_insert_utilization
target:
averageValue: 800m
type: AverageValue
type: Pods
minReplicas: 6
image:
tag: v1.131.0-cluster
minReadySeconds: 180
podDisruptionBudget:
maxUnavailable: 1
port: "8480"
priorityClassName: monitoring-canary
replicaCount: 6
resources:
limits:
memory: 2Gi
requests:
cpu: 500m
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
serviceSpec:
metadata:
annotations:
service.kubernetes.io/topology-mode: Auto
name: vminsert-canary-az-aware
spec: {}
terminationGracePeriodSeconds: 300
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
vmselect:
cacheMountPath: /select-cache
clusterNativeListenPort: "8401"
extraArgs:
dedup.minScrapeInterval: 60s
extraEnvs:
- name: GOMAXPROCS
value: "4"
hpa:
maxReplicas: 6
metrics:
- pods:
metric:
name: vm_concurrent_select_utilization
target:
averageValue: 800m
type: AverageValue
type: Pods
minReplicas: 3
image:
tag: v1.131.0-cluster
minReadySeconds: 30
podDisruptionBudget:
maxUnavailable: 1
port: "8481"
priorityClassName: monitoring-canary
replicaCount: 2
resources:
limits:
memory: 3Gi
requests:
cpu: 1500m
rollingUpdateStrategy: RollingUpdate
serviceSpec:
metadata:
annotations:
service.kubernetes.io/topology-mode: Auto
name: vmselect-canary-az-aware
spec: {}
storage:
emptyDir: {}
volumeClaimTemplate:
spec:
resources:
requests:
storage: 2Gi
terminationGracePeriodSeconds: 60
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
vmstorage:
extraArgs:
dedup.minScrapeInterval: 60s
extraEnvs:
- name: GOMAXPROCS
value: "8"
image:
tag: v1.131.0-cluster
minReadySeconds: 180
podDisruptionBudget:
maxUnavailable: 1
priorityClassName: canary-monitoring
replicaCount: 6
resources:
requests:
cpu: '7'
limits:
memory: 27Gi
rollingUpdateStrategy: RollingUpdate
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 500Gi
storageClassName: hdd
storageDataPath: /vm-data
terminationGracePeriodSeconds: 900
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
setting maxUnavailable: 100% will allow restarting all instances of a component at once but there will be a small amount of downtime. If you need an ha pair of clusters look at the distributed helm chart
I just applied change to upgrade the tags for all 3 nodes and also added the GOMAXPROCS to extraEnvs. Instead of waiting for each node to complete their rollout, all 3 were updated in parallel.
Could you please a bit more specific? Which node tags were changed ( Kubernetes Node)? And WDYM by all 3 were updated in parallel ( 3 storage pods updated at once)?
I only can guess, that it's related to the podDisruptionBudget and maxUnavailable: 1.
Also, with rollingUpdateStrategy: RollingUpdate, operator doesn't perform any pod level actions, it delegates update process to the kubernetes-controller-manager.
Could you please a bit more specific? Which node tags were changed ( Kubernetes Node)? And WDYM by all
3were updated in parallel ( 3 storage pods updated at once)?
Sorry, vmselect, vminsert, and vmstorage components all were updated in parallel.
Also, with
rollingUpdateStrategy: RollingUpdate, operator doesn't perform any pod level actions, it delegates update process to thekubernetes-controller-manager.
Yup, the operator updated the vmstorage and vmselect STS and the vminsert deployment at the same-ish time. Rather than doing them serially.
@AndrewChubatiuk @vrutkovs Could you please take a look? A was not able to reproduce it. Most probably, Kubernetes controller-manager doesn't properly reflect statefulSet.status updates during rollout. And operator relies on it with rollingUpdateStrategy: RollingUpdate.
Started rolling out an image only change this week and so far have this only happened once out of 6 vmcluster objects so far.
I didn't notice it until about 15 minutes after it happened. Here's the status of the vmstorage STS:
status:
availableReplicas: 5
collisionCount: 0
currentReplicas: 2
currentRevision: vmstorage-internal-686c76f5b8
observedGeneration: 10
readyReplicas: 5
replicas: 6
updateRevision: vmstorage-internal-545b68688f
updatedReplicas: 3
The vmoperator only had this when it udpated the vmselect STS (with the diff showing the image change):
logger":"controller.VMCluster","msg":"updating statefulset vmselect-internal configuration, is_current_equal=false,is_prev_equal=false,is_prev_nil=false