xpk icon indicating copy to clipboard operation
xpk copied to clipboard

xpk should fail fast if a JobSet or PathwaysJob will result in an invalid value for the `jobset.sigs.k8s.io/coordinator` label

Open GiuseppeTT opened this issue 3 months ago • 1 comments

JobSet's admission webhook now rejects requests to create JobSet objects that would result in an invalid value for the jobset.sigs.k8s.io/coordinator label. In most cases, this is equivalent to limiting the length of the JobSet name when the coordinator feature is enabled. See https://github.com/kubernetes-sigs/jobset/issues/1056 and https://github.com/kubernetes-sigs/jobset/pull/1079.

Following this fail-fast principle, xpk should also preemptively fail commands that would lead to this invalid state.

For xpk commands that create a JobSet object directly, the error from the JobSet admission webhook can likely just be bubbled up to the user.

The core problem arises when xpk creates a PathwaysJob object. Since the PathwaysJob controller does not have an admission webhook, the creation request succeeds. However, the PathwaysJob controller will then try to create a child JobSet at runtime and continuously fail because the JobSet webhook will block the invalid request. This results in a confusing, difficult-to-debug failure loop for the user, as the initial xpk command appeared to succeed.

GiuseppeTT avatar Oct 28 '25 21:10 GiuseppeTT

Created a task: http://b/456100304.

jamOne- avatar Oct 29 '25 11:10 jamOne-