SDK-199 Auth Retry policy completely bypassed
🔹 Jira Ticket(s)
✏️ Description
Fixes auth retry for expired JWT refresh: scheduled token refresh no longer bypasses pauseAuthRetries or authRetryPolicy.maxRetry, preventing runaway/infinite onAuthTokenRequested loops.
Added unit tests to lock pause + maxRetry behavior on scheduled refresh, and reset retry count after a successful refresh is queued.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 69.48%. Comparing base (f896dc7) to head (4a72ae5).
Additional details and impacted files
@@ Coverage Diff @@
## master #987 +/- ##
==========================================
+ Coverage 69.45% 69.48% +0.03%
==========================================
Files 109 109
Lines 8944 8953 +9
==========================================
+ Hits 6212 6221 +9
Misses 2732 2732
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Pull Request Review Comment
Please Consider Refactoring to Option 1 from the Ticket
The current fix works, but I strongly recommend simplifying to the Option 1 approach that was outlined in the ticket description. Here's why:
The Core Problem
This bug exists because shouldIgnoreRetryPolicy: true bypasses safety limits (pauseAuthRetry and maxRetry). The ticket correctly identifies these as critical controls that prevent infinite loops and battery drain.
The question is: should we ever allow these safety limits to be bypassed?
Current Implementation: Selective Bypass
Your fix introduces a flag isInScheduledRefreshCallback to detect when we're in a scheduled refresh, then enforces safety limits only in that specific case:
private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
if pauseAuthRetry {
return true
}
// Only enforces maxRetry for scheduled refreshes
if isInScheduledRefreshCallback {
return retryCount >= authRetryPolicy.maxRetry
}
// Still allows bypass in other cases
return retryCount >= authRetryPolicy.maxRetry && !shouldIgnoreRetryPolicy
}
This means: Any code path that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) outside of scheduled refreshes can still bypass maxRetry. You've fixed the symptom (scheduled refresh infinite loop) but left the backdoor open.
Option 1: Always Enforce Safety Limits
private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
if pauseAuthRetry {
return true
}
if retryCount >= authRetryPolicy.maxRetry {
return true
}
return false
}
This means: No code path can bypass safety limits, ever. Period.
Why This Is Better
1. Prevents ALL infinite loops, not just scheduled refresh loops
If tomorrow someone adds code that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) in a different context (error handler, manual retry, etc.), your current fix won't protect them. Option 1 would.
2. No state management complexity
Your approach requires:
- A new instance variable
- Setting the flag before the call
- Clearing the flag after the call
- Future developers understanding this flag exists and what it means
Option 1 requires: none of that.
3. Thread-safe by default
Flag management patterns can be racy if methods get called from multiple threads. Option 1 has no such risk.
4. Matches the stated intent
The ticket says: "pauseAuthRetry and maxRetry are safety controls that should ALWAYS be respected."
Your implementation respects them sometimes. Option 1 respects them always.
5. Fewer lines of code
Current: 12 lines in shouldPauseRetry() + flag declaration + flag management
Option 1: 8 lines total
Simple code is maintainable code.
What About shouldIgnoreRetryPolicy?
I think there's confusion about what this parameter is supposed to do. Based on the name and ticket context, it seems like it was meant to bypass retry backoff delays, not safety limits.
If the intent is "try again immediately without waiting," that's fine. But that's different from "ignore maxRetry and keep trying forever."
Proposal: Safety limits (pause + maxRetry) should be absolute. If we need to bypass backoff timing, that's a separate concern and should be named/handled differently.
The Root Cause Matters
The ticket identifies the root cause perfectly:
"The shouldIgnoreRetryPolicy parameter was likely introduced to allow immediate retries in certain scenarios (like manual user-triggered refreshes), but inadvertently became part of the automatic token expiration flow, bypassing critical safety controls."
The fix shouldn't be to add a special case that detects when we're in the problematic flow. The fix should be to remove the ability to bypass safety controls because that's fundamentally dangerous regardless of the flow.
What If We Need to Bypass?
If there's a legitimate use case where we need to retry beyond maxRetry, that should be:
- Explicitly documented in code comments
- A separate code path with a different method name
- Something we consciously choose, not accidentally enable via a boolean parameter
Right now, shouldIgnoreRetryPolicy: true is a loaded gun sitting in the codebase. Your fix adds a safety on one trigger, but the gun is still loaded.
Bottom Line
Option 1 is:
- Simpler (less code, no state management)
- Safer (no infinite loop scenarios, thread-safe)
- Clearer (safety limits are always enforced)
- More maintainable (future developers can't bypass accidentally)
The only downside is it changes behavior for all callers of shouldIgnoreRetryPolicy: true. But that's actually a feature because bypassing safety limits was the bug in the first place.
Please consider refactoring to Option 1. If there's a specific scenario that requires bypassing maxRetry, let's discuss it explicitly so we can design a proper solution.
Agree with Claude that we should take the simpler approach @joaodordio
@sumeruchat We need to have a bigger discussion about this.
The solution you are recommending based on Claude, removes the shouldIgnoreRetryPolicy flag from the implementation.
We need to further discuss offline why we need that flag and what the purpose of it is.