xpk icon indicating copy to clipboard operation
xpk copied to clipboard

XPK creates a broken GPU NodePool with --spot flag

Open m-strzelczyk opened this issue 10 months ago • 1 comments

My scenario:

$ export CLUSTER_NAME=<>
$ export NUM_NODES=2
$ export ZONE=us-east4-b
$ export PROJECT_ID=<>

$ xpk version
[XPK] Starting xpk
[XPK] xpk_version: v0.7.1
[XPK] XPK Done.

$ xpk cluster create \
--cluster $CLUSTER_NAME \
--device-type=h100-mega-80gb-8 \
--num-nodes=$NUM_NODES \
--zone=$ZONE \
--project=$PROJECT_ID \
--spot \
--default-pool-cpu-machine-type=e2-standard-32 

Results in a cluster with a NodePool that is in Error state: Google Compute Engine: Invalid value for field 'resource.properties.scheduling.preemptible': 'false'. Scheduling must have preemptible be false when AutomaticRestart is true.

m-strzelczyk avatar Mar 26 '25 14:03 m-strzelczyk

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 30 '25 02:10 github-actions[bot]