optimize the backoff mechanism

Open zz-jason opened this issue 4 years ago • 1 comments

Enhancement

Summary

There are two backoffs in the current backoff and retry mechanism, the first one is inside the callWithRetry() function, the other one is outside the function, can be expressed like the following pseudo code:

callWithRetry()
    backoff
backoff

These backoffs can be integrated into one backoff like the following:

callWithRetry()
    backoff

Details

The following slow log illustrates the problem, the first backoff happens at

        {
            "name":"backoff BoRegionMiss",
            "start":"16:21:51.605",
            "end":"16:21:51.626",
            "duration":"21ms"
        },

the second backoff happens at:

        {
            "name":"backoff BoRegionMiss",
            "start":"16:21:51.626",
            "end":"16:21:51.666",
            "duration":"40ms"
        },

As you can see, BoRegionMiss is executed twice and the total backoff time is increased to more than 60 seconds.

{
    "start":"16:21:51.605",
    "end":"16:21:51.666",
    "duration":"61ms",
    "func":"get",
    "region":"{Region[8590843696] ConfVer[10372] Version[153] Store[8589948024] KeyRange[k1]:[k2]}",
    "key":"k1",
    "spans":[
        {
            "name":"getRegionByKey",
            "start":"16:21:51.605",
            "end":"16:21:51.605",
            "duration":"0ms"
        },
        {
            "name":"callWithRetry tikvpb.Tikv/RawGet",
            "start":"16:21:51.605",
            "end":"16:21:51.626",
            "duration":"21ms"
        },
        {
            "name":"gRPC tikvpb.Tikv/RawGet",
            "start":"16:21:51.605",
            "end":"16:21:51.605",
            "duration":"0ms"
        },
        {
            "name":"backoff BoRegionMiss",
            "start":"16:21:51.605",
            "end":"16:21:51.626",
            "duration":"21ms"
        },
        {
            "name":"backoff BoRegionMiss",
            "start":"16:21:51.626",
            "end":"16:21:51.666",
            "duration":"40ms"
        },
        {
            "name":"getRegionByKey",
            "start":"16:21:51.666",
            "end":"16:21:51.666",
            "duration":"0ms"
        },
        {
            "name":"callWithRetry tikvpb.Tikv/RawGet",
            "start":"16:21:51.666",
            "end":"16:21:51.666",
            "duration":"0ms"
        },
        {
            "name":"gRPC tikvpb.Tikv/RawGet",
            "start":"16:21:51.666",
            "end":"16:21:51.666",
            "duration":"0ms"
        }
    ]
}

Dec 15 '21 13:12 zz-jason

This issue is stale because it has been open 30 days with no activity.

Feb 25 '22 00:02 github-actions[bot]