operator icon indicating copy to clipboard operation
operator copied to clipboard

bug: vmcluster parallel rollout of nodes

Open dctrwatson opened this issue 1 month ago • 3 comments

vmoperator v0.65.0

I just applied change to upgrade the tags for all 3 nodes and also added the GOMAXPROCS to extraEnvs. Instead of waiting for each node to complete their rollout, all 3 were updated in parallel.

Example VMCluster CR:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: canary
  namespace: monitoring
spec:
  replicationFactor: 2
  retentionPeriod: "3"
  vminsert:
    extraEnvs:
      - name: GOMAXPROCS
        value: "2"
    hpa:
      maxReplicas: 12
      metrics:
        - pods:
            metric:
              name: vm_concurrent_insert_utilization
            target:
              averageValue: 800m
              type: AverageValue
          type: Pods
      minReplicas: 6
    image:
      tag: v1.131.0-cluster
    minReadySeconds: 180
    podDisruptionBudget:
      maxUnavailable: 1
    port: "8480"
    priorityClassName: monitoring-canary
    replicaCount: 6
    resources:
      limits:
        memory: 2Gi
      requests:
        cpu: 500m
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    serviceSpec:
      metadata:
        annotations:
          service.kubernetes.io/topology-mode: Auto
        name: vminsert-canary-az-aware
      spec: {}
    terminationGracePeriodSeconds: 300
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
  vmselect:
    cacheMountPath: /select-cache
    clusterNativeListenPort: "8401"
    extraArgs:
      dedup.minScrapeInterval: 60s
    extraEnvs:
      - name: GOMAXPROCS
        value: "4"
    hpa:
      maxReplicas: 6
      metrics:
        - pods:
            metric:
              name: vm_concurrent_select_utilization
            target:
              averageValue: 800m
              type: AverageValue
          type: Pods
      minReplicas: 3
    image:
      tag: v1.131.0-cluster
    minReadySeconds: 30
    podDisruptionBudget:
      maxUnavailable: 1
    port: "8481"
    priorityClassName: monitoring-canary
    replicaCount: 2
    resources:
      limits:
        memory: 3Gi
      requests:
        cpu: 1500m
    rollingUpdateStrategy: RollingUpdate
    serviceSpec:
      metadata:
        annotations:
          service.kubernetes.io/topology-mode: Auto
        name: vmselect-canary-az-aware
      spec: {}
    storage:
      emptyDir: {}
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 2Gi
    terminationGracePeriodSeconds: 60
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
  vmstorage:
    extraArgs:
      dedup.minScrapeInterval: 60s
    extraEnvs:
      - name: GOMAXPROCS
        value: "8"
    image:
      tag: v1.131.0-cluster
    minReadySeconds: 180
    podDisruptionBudget:
      maxUnavailable: 1
    priorityClassName: canary-monitoring
    replicaCount: 6
    resources:
      requests:
        cpu: '7'
      limits:
        memory: 27Gi
    rollingUpdateStrategy: RollingUpdate
    storage:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 500Gi
          storageClassName: hdd
    storageDataPath: /vm-data
    terminationGracePeriodSeconds: 900
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

dctrwatson avatar Dec 15 '25 23:12 dctrwatson

setting maxUnavailable: 100% will allow restarting all instances of a component at once but there will be a small amount of downtime. If you need an ha pair of clusters look at the distributed helm chart

tiny-pangolin avatar Dec 16 '25 03:12 tiny-pangolin

I just applied change to upgrade the tags for all 3 nodes and also added the GOMAXPROCS to extraEnvs. Instead of waiting for each node to complete their rollout, all 3 were updated in parallel.

Could you please a bit more specific? Which node tags were changed ( Kubernetes Node)? And WDYM by all 3 were updated in parallel ( 3 storage pods updated at once)?

I only can guess, that it's related to the podDisruptionBudget and maxUnavailable: 1.

Also, with rollingUpdateStrategy: RollingUpdate, operator doesn't perform any pod level actions, it delegates update process to the kubernetes-controller-manager.

f41gh7 avatar Dec 16 '25 15:12 f41gh7

Could you please a bit more specific? Which node tags were changed ( Kubernetes Node)? And WDYM by all 3 were updated in parallel ( 3 storage pods updated at once)?

Sorry, vmselect, vminsert, and vmstorage components all were updated in parallel.

Also, with rollingUpdateStrategy: RollingUpdate, operator doesn't perform any pod level actions, it delegates update process to the kubernetes-controller-manager.

Yup, the operator updated the vmstorage and vmselect STS and the vminsert deployment at the same-ish time. Rather than doing them serially.

dctrwatson avatar Dec 16 '25 21:12 dctrwatson

@AndrewChubatiuk @vrutkovs Could you please take a look? A was not able to reproduce it. Most probably, Kubernetes controller-manager doesn't properly reflect statefulSet.status updates during rollout. And operator relies on it with rollingUpdateStrategy: RollingUpdate.

f41gh7 avatar Jan 05 '26 18:01 f41gh7

Started rolling out an image only change this week and so far have this only happened once out of 6 vmcluster objects so far.

I didn't notice it until about 15 minutes after it happened. Here's the status of the vmstorage STS:

status:
  availableReplicas: 5
  collisionCount: 0
  currentReplicas: 2
  currentRevision: vmstorage-internal-686c76f5b8
  observedGeneration: 10
  readyReplicas: 5
  replicas: 6
  updateRevision: vmstorage-internal-545b68688f
  updatedReplicas: 3

The vmoperator only had this when it udpated the vmselect STS (with the diff showing the image change):

logger":"controller.VMCluster","msg":"updating statefulset vmselect-internal configuration, is_current_equal=false,is_prev_equal=false,is_prev_nil=false

dctrwatson avatar Jan 07 '26 00:01 dctrwatson