flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency)

Open ddl-rliu opened this issue 1 year ago • 1 comments

Motivation: Why do you think this is important?

Today, the flyteadmin pod is blocked from starting up until the OIDC provider is healthy and available (the pod gets stuck in Error state). In some Kubernetes configurations, this erroring-pod could cause deployment-wide issues. The current behavior could be made more resilient.

(Note that this applies to configurations using useAuth=true)

Goal: What should the final outcome look like, ideally?

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

Describe alternatives you've considered

A workaround is to disable Flyte until the OIDC provider is available.

Propose: Link/Inline OR Additional context

Proposed fix here: https://github.com/flyteorg/flyte/pull/5702

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

ddl-rliu avatar Aug 28 '24 19:08 ddl-rliu

The issue of FlyteAdmin being blocked from starting up until the OIDC provider is healthy is known. According to Flyte's documentation, FlyteAdmin requires an OIDC provider for authentication when useAuth=true. This can cause deployment issues if the OIDC provider is unavailable.

A proposed solution is to allow FlyteAdmin to start even if the OIDC provider is unavailable and re-initialize the OIDC provider later. This approach would improve resiliency and can be made configurable. Currently, a workaround is to disable Flyte until the OIDC provider is available.

Would you like more details on how to implement this or any other specific aspect?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar Aug 28 '24 19:08 runllm[bot]

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

This is a double edged sword. If I recall correctly there was some work done recently to indefinitely cache something related to OIDC on boot. If the OIDC provider is down and you do a normal rolling deployment you could end up in a worse state since you previously had working pods but now you have broken ones.

Sovietaced avatar Oct 30 '24 05:10 Sovietaced

If I recall correctly there was some work done recently to indefinitely cache something related to OIDC on boot.

@Sovietaced , can you point to this change? Are you thinking of https://github.com/flyteorg/flyte/pull/5621/?

eapolinario avatar Nov 22 '24 17:11 eapolinario

@Sovietaced , can you point to this change? Are you thinking of #5621?

Yeah I think so

Sovietaced avatar Nov 22 '24 18:11 Sovietaced

"Hello 👋, this feature request has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 14 days. Thank you for your contribution and understanding! 🙏"

github-actions[bot] avatar Aug 20 '25 00:08 github-actions[bot]

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Sep 04 '25 00:09 github-actions[bot]