xpk icon indicating copy to clipboard operation
xpk copied to clipboard

Add tpu resouce flavor to kueue

Open wstcliyu opened this issue 9 months ago • 1 comments

Fixes / Features

  • Add tpu resouce flavor to kueue config. AxLearn workloads request cpu and memory as well as tpu. So xpk needs to add cpu and memory to tpu flavor for AxLearn workload to be accepted by kueue.

Testing / Documentation

Testing details.

  • [ y ] Tests pass
  • [ y ] Appropriate changes to documentation are included in the PR

wstcliyu avatar Jul 16 '25 00:07 wstcliyu

This looks good to me, but I wanted to check one thing, would you be able to confirm that a CPU pod doesn't land on a TPU nodepool? May need to add tolerations for this in XPK, but not sure. You can test this by creating a simple cluster (or using an existing cluster) that does not have any CPU nodepools, and launch a cpu only jobset on it and confirm that it's unschedulable.

SujeethJinesh avatar Jul 16 '25 00:07 SujeethJinesh