flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] `Predicate NodeAffinity failed` incorrectly handled as user error on GKE

Open jeevb opened this issue 2 years ago • 1 comments

Describe the bug

On GKE v1.25.10-gke.1200:

Interruptible task is marked as failed with error message:

[1/1] currentAttempt done. Last Error: USER::Pod Predicate NodeAffinity failed

The following kubelet logs were recovered from the node in question:

{
  "timestamp": "2023-08-10T18:00:54Z",
  "log": "Starting kubelet."
}
{
  "timestamp": "2023-08-10T18:01:16Z",
  "log": "Shutdown manager detected shutdown event"
}
{
  "timestamp": "2023-08-10T18:03:22Z",
  "log": "Starting kubelet."
}
{
  "timestamp": "2023-08-10T18:03:24Z",
  "log": "Predicate NodeAffinity failed"
}

This suggests that the node in question was originally preempted, and was subsequently recreated with the same name, making this falsely appear as a node restart. Per cluster logs, kubelet forces a delete of the unrecognized pod. This is then incorrectly handled as a user error by propeller.

Expected behavior

Predicate NodeAffinity failed should be treated as a system failure, similar to node interruption, and the system retry budget should be respected.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

jeevb avatar Aug 14 '23 20:08 jeevb

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Jun 30 '24 00:06 github-actions[bot]

Hello 👋, this issue has been inactive for over 90 days and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar May 16 '25 00:05 github-actions[bot]