retry: Default MaxBackoff is unintuitive
Issue
I would expect the following
func Test(t *testing.T) {
err:=retry.Do(
context.Background(),
func() error {
fmt.Println(time.Now())
return errors.New("some errors")
},
retry.WithInitialBackoff(10*time.Second), retry.WithMultiplier(1), retry.WithMaxAttempts(5),
)
require.NoError(t,err)
}
To output one log line every 10 seconds. However what this actually outputs is
2019-01-16 14:51:16.025567248 -0800 PST m=+0.011095106
2019-01-16 14:51:18.092852159 -0800 PST m=+2.078377531
2019-01-16 14:51:20.357685855 -0800 PST m=+4.343208502
2019-01-16 14:51:22.459165577 -0800 PST m=+6.444685696
2019-01-16 14:51:24.425426817 -0800 PST m=+8.410944570
The reason for this output is the default for MaxBackoff is as follows:
// WithMaxBackoff sets upper limit on backoff duration.
//
// Max backoff of 0 indicates no limit.
//
// If max backoff option is not used, then default value of 2 seconds is used.
While the documentation is good, this caused us to write bugs that affected us in production as I think it's unintuitive.
Proposal
Set the default value of max backoff to 0 instead of 2 seconds, so that the max backoff will be unlimited.
I realize this is a breaking change, but I would believe that most people using this library would expect it to work this way
@ryanmcnamara I don't quite follow the problem above. Maybe you have a typo here? To output one log line every 2 seconds? The output is infact a log line every 2 seconds right?
Could you clarify what you would have expected to happen with the code snippet above?
Fixed my typo Expected that it outputs a lot every 10 seconds since the initial backoff is set to 10 seconds
Cool, maybe a "fix" for this is instead of making MaxBackoff 0, we set it to the initial backoff value if that is configured?
Yes, although this is working as documented, agree that it's unintuitive given the starting parameters.
If the default value isn't a concrete value, I feel like 0 (no default limit) makes the most sense.
@ashrayjain if MaxBackoff is set to the initial backoff value by default, then whenever initial back-off is specified then the retry is effectively not exponential, right? Since back-off is defined as min(initialBackoff * pow(multiplier, $retryAttempt), maxBackoff == 0 ? +Inf : maxBackoff) * (1.0 - randomizationFactor + 2 * rand(0, randomizationFactor)), if maxBackoff == initialBackoff, then the retry behavior just becomes "wait for the initial back-off time and then add jitter".
Cool, "0" works then. I really think we should make sure nobody gets burned by this change though.
Can we excavate all places that use retry.Do currently without setting a MaxBackoff and explicitly add a MaxBackoff(2) in those places to preserve behavior?
Yeah it's hard to say for sure because Mateusz isn't around, but I believe the intent of the default value for max backoff is that most "simple" calls probably don't want to back off for more than 2 seconds at the upper bound. However, this is obviously completely subjective, and does result in very confusing behavior when the initial back-off is set to be above the max.
It sounds like the basic possible paths are:
- Change default value and attempt to excavate/programmatically convert old calls to maintain compatibility
- Keep the current behavior, but improve documentation/flag to users that they will likely need to explicitly configure the max back-off if they set an initial back-off
- Create a v2 version of the library with the new default behavior (we would use the "major version package" approach so that v1 and v2 can coexist in the same codebase)
I would much prefer making the break to the api (first option @nmiyake suggested above). I'm not that worried about excavating the changes, because I believe that all usages of this that don't explicitly set MaxBackoff intended for it to be zero (no limit).
I think adding a release note and letting consumers audit their own code base on upgrade is sufficient
my vote is for approach 3 which is the cleanest and imo, the "right" way to handle this. We can choose to deprecate v1 and remove it entirely after some time if we'd like.
or we can make it a compile break to force people to explicitly pick a value somehow.
@ryanmcnamara all usages of this that don't explicitly set MaxBackoff intended for it to be zero (no limit). is not true in at least a few places that I know of. It doesn't really feel right to hide such a break in release notes which are arguably easy to miss when upgrading (do you really read the all the release nodes for every single dependency bump, especially when not even upgrading past major versions?)
@willdeuschle submitted #186 before either of us saw this ticket. Would folks be ok with that change (using initialBackoff as the max if it's larger than maxBackoff)?