client-java
client-java copied to clipboard
optimize the backoff mechanism
Enhancement
Summary
There are two backoffs in the current backoff and retry mechanism, the first one is inside the callWithRetry() function, the other one is outside the function, can be expressed like the following pseudo code:
callWithRetry()
backoff
backoff
These backoffs can be integrated into one backoff like the following:
callWithRetry()
backoff
Details
The following slow log illustrates the problem, the first backoff happens at
{
"name":"backoff BoRegionMiss",
"start":"16:21:51.605",
"end":"16:21:51.626",
"duration":"21ms"
},
the second backoff happens at:
{
"name":"backoff BoRegionMiss",
"start":"16:21:51.626",
"end":"16:21:51.666",
"duration":"40ms"
},
As you can see, BoRegionMiss is executed twice and the total backoff time is increased to more than 60 seconds.
{
"start":"16:21:51.605",
"end":"16:21:51.666",
"duration":"61ms",
"func":"get",
"region":"{Region[8590843696] ConfVer[10372] Version[153] Store[8589948024] KeyRange[k1]:[k2]}",
"key":"k1",
"spans":[
{
"name":"getRegionByKey",
"start":"16:21:51.605",
"end":"16:21:51.605",
"duration":"0ms"
},
{
"name":"callWithRetry tikvpb.Tikv/RawGet",
"start":"16:21:51.605",
"end":"16:21:51.626",
"duration":"21ms"
},
{
"name":"gRPC tikvpb.Tikv/RawGet",
"start":"16:21:51.605",
"end":"16:21:51.605",
"duration":"0ms"
},
{
"name":"backoff BoRegionMiss",
"start":"16:21:51.605",
"end":"16:21:51.626",
"duration":"21ms"
},
{
"name":"backoff BoRegionMiss",
"start":"16:21:51.626",
"end":"16:21:51.666",
"duration":"40ms"
},
{
"name":"getRegionByKey",
"start":"16:21:51.666",
"end":"16:21:51.666",
"duration":"0ms"
},
{
"name":"callWithRetry tikvpb.Tikv/RawGet",
"start":"16:21:51.666",
"end":"16:21:51.666",
"duration":"0ms"
},
{
"name":"gRPC tikvpb.Tikv/RawGet",
"start":"16:21:51.666",
"end":"16:21:51.666",
"duration":"0ms"
}
]
}
This issue is stale because it has been open 30 days with no activity.