aaw icon indicating copy to clipboard operation
aaw copied to clipboard

Pods with disks can schedule to the wrong node

Open zachomedia opened this issue 3 years ago • 8 comments

We should use nodeSelectors to ensure that pods with disks mount to the correct nodes (no zone or correct zone).

Kubeflow and other systems have some pods that depend on specific node locations.

zachomedia avatar Apr 24 '22 00:04 zachomedia

The shared-daaas-system es nodes have been having a problem with this for a while, however we seem to be stuck now as there doesn't seem to be any zones using '1' anymore (and only a temp node as 0 currentlY), they have been switched over to canadacentral-x.

HTTPStatusCode: 400, RawError: { "error": { "code": "BadRequest", "message": "Disk /subscriptions/9f29402c-64f1-4691-853c-a14607472bdc/resourceGroups/aaw-prod-cc-00-rg-aks-managed/providers/Microsoft.Compute/disks/restore-52eecf64-79d8-4c69-9354-7e432c99cdc9 cannot be attached to the VM because it is not in zone '1'." } }

Is zone: canadacentral-1 the same as the previous zone: 1 ? Where is the zone mapping configured is it on the volume itself?

c.c. @chuckbelisle

vexingly avatar Jun 27 '22 21:06 vexingly

This is where AAW prod is in a bit of an odd place, zone 1 is the disks that were migrated from the old environment and are technically not in an availability zone. Therefore, they need to be scheduled on one of the temp nodes.

Likely the workloads are missing nodeSelectors which will force them onto the right nodes at scheduling time.

zachomedia avatar Jun 27 '22 22:06 zachomedia

@zachomedia thanks for the background, I was going to try the nodeSlector method for this specific issue, however there doesn't seem to be any temp node in zone 1 anymore, only zone 0... how are those nodes provisioned? :)

Also is there a plan to migrate the storage so we don't need the temp nodes anymore?

vexingly avatar Jun 29 '22 15:06 vexingly

@vexingly I don't believe the zone matters on the temp nodes because they are not officially attached to a zone.

zachomedia avatar Jun 29 '22 15:06 zachomedia

Oh, it's confusing that they are labeled with a zone... but you're right it doesn't seem to matter for accessing the storage! 👍

vexingly avatar Jun 29 '22 18:06 vexingly

@zachomedia are you looking at for this sprint?

sylus avatar Jun 30 '22 13:06 sylus

I think @vexingly has it under control, but if anyone needs my help I'm around :)

zachomedia avatar Jun 30 '22 13:06 zachomedia

The shared-daaas-system elasticsearch has the nodeselector now and is working well, I don't know of other pods that need something similar but if we see any of them start to fail we can fix them up then.

vexingly avatar Jun 30 '22 13:06 vexingly

This issue arose once again on Dec. 2nd, 2023 during the upgrade to Kubernetes 1.26.

While trying to fix it by setting nodeSelectors, it seems that there are policies in place that strip it for some workloads. To get around this, a nodeAffinity was set on the pvs that are not zonal (mostly just disks restored from the previous cluster).

 nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
                - '0'

Warning: nodeAffinities are immutable. Ensure caution when setting them and set persistentVolumeReclaimPolicy to Retain if you need to delete and recreate the PV/PVC.

justbert avatar Dec 04 '23 13:12 justbert