[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency)
Tracking issue
https://github.com/flyteorg/flyte/issues/5701
Why are the changes needed?
Today, the flyteadmin pod is blocked from starting up until the OIDC provider is healthy and available (the pod gets stuck in Error state). In some Kubernetes configurations, this erroring-pod could cause deployment-wide issues. The current behavior could be made more resilient.
(Note that this applies to configurations using useAuth=true)
What changes were proposed in this pull request?
A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.
Adds an onlyStartIfOIDCIsAvailable config which controls this behavior.
How was this patch tested?
A writeup is here which shows the "good" flow when onlyStartIfOIDCIsAvailable is enabled and OIDC is unhealthy for a period: https://gist.github.com/ddl-rliu/4c09862404f46a5adbc451025160e0eb
Setup process
Screenshots
Check all the applicable boxes
- [ ] I updated the documentation accordingly.
- [ ] All new and existing tests passed.
- [x] All commits are signed-off.
Related PRs
Docs link
Codecov Report
Attention: Patch coverage is 2.38095% with 41 lines in your changes missing coverage. Please review.
Project coverage is 36.17%. Comparing base (
f075b34) to head (080a4cf). Report is 423 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| flyteadmin/auth/auth_context.go | 2.38% | 41 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #5702 +/- ##
===========================================
- Coverage 60.92% 36.17% -24.76%
===========================================
Files 796 1302 +506
Lines 51689 109627 +57938
===========================================
+ Hits 31494 39660 +8166
- Misses 17288 65822 +48534
- Partials 2907 4145 +1238
| Flag | Coverage Δ | |
|---|---|---|
| unittests-datacatalog | 51.37% <ø> (-17.95%) |
:arrow_down: |
| unittests-flyteadmin | 55.29% <2.38%> (-3.44%) |
:arrow_down: |
| unittests-flytecopilot | 12.17% <ø> (-5.62%) |
:arrow_down: |
| unittests-flytectl | 62.18% <ø> (-5.24%) |
:arrow_down: |
| unittests-flyteidl | 7.12% <ø> (-71.92%) |
:arrow_down: |
| unittests-flyteplugins | 53.34% <ø> (-8.51%) |
:arrow_down: |
| unittests-flytepropeller | 41.71% <ø> (-15.54%) |
:arrow_down: |
| unittests-flytestdlib | 55.35% <ø> (-10.25%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@Sovietaced brings up a good point regarding this change.
Cleaning stale PRs. Please reopen if you wan to discuss this further.