xpk
xpk copied to clipboard
XPK creates a broken GPU NodePool with --spot flag
My scenario:
$ export CLUSTER_NAME=<>
$ export NUM_NODES=2
$ export ZONE=us-east4-b
$ export PROJECT_ID=<>
$ xpk version
[XPK] Starting xpk
[XPK] xpk_version: v0.7.1
[XPK] XPK Done.
$ xpk cluster create \
--cluster $CLUSTER_NAME \
--device-type=h100-mega-80gb-8 \
--num-nodes=$NUM_NODES \
--zone=$ZONE \
--project=$PROJECT_ID \
--spot \
--default-pool-cpu-machine-type=e2-standard-32
Results in a cluster with a NodePool that is in Error state: Google Compute Engine: Invalid value for field 'resource.properties.scheduling.preemptible': 'false'. Scheduling must have preemptible be false when AutomaticRestart is true.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.