client-go icon indicating copy to clipboard operation
client-go copied to clipboard

performance of searchCachedRegion could be enhanced

Open chrysan opened this issue 3 years ago • 3 comments

When many regions are cached in memory, searchCachedRegion becomes slower and holds RegionCache global read lock for longer time, and then makes other queries who load new regions wait for write lock. When QPS grows, the mutex contention becomes even worse and query latency grows.

image

findRegionByKey waits for write lock:

goroutine 8144205746 [semacquire]:goroutine 8144205746 [semacquire]:sync.runtime_SemacquireMutex(0xc0003ae014, 0x0, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47sync.(*Mutex).lockSlow(0xc0003ae010) /usr/local/go/src/sync/mutex.go:138 +0xfcsync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81sync.(*RWMutex).Lock(0xc0003ae010) /usr/local/go/src/sync/rwmutex.go:98 +0x97github.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey(0xc0003ae000, 0xc4040d51a8, 0xc0eb43fce0, 0x13, 0x13, 0xc031612e00, 0x7fcaad8be1f0, 0x0, 0x40) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:582 +0x6c2

searchCachedRegion holds read lock:

goroutine 8056902450 [runnable]:goroutine 8056902450 [runnable]:github.com/pingcap/tidb/store/tikv.(*RegionCache).searchCachedRegion.func1(0x3886ec0, 0xc263c1d180, 0xc1864d85c0) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:914 +0x173github.com/google/btree.(*node).iterate(0xc323d37c80, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:557 +0x1cdgithub.com/google/btree.(*node).iterate(0xc225ed4980, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc34e0e7640, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc0d1673e40, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0xc0322f0101, 0xc0322f0088, 0x20) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc3d65de240, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x1, 0xc0322f0088, 0x32) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:549 +0x115github.com/google/btree.(*BTree).DescendLessOrEqual(...) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/[email protected]/btree.go:795github.com/pingcap/tidb/store/tikv.(*RegionCache).searchCachedRegion(0xc0003ae000, 0xc44df41aa0, 0x1c, 0x30, 0x11f8800, 0xb) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:914 +0x2aegithub.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey(0xc0003ae000, 0xc0322f07a8, 0xc44df41aa0, 0x1c, 0x30, 0x11bf700, 0xc0322f0318, 0x11f483c, 0x0)

chrysan avatar Jun 22 '22 15:06 chrysan

BTW, memory usage of region cache could be tracked in case of risk of oom.

chrysan avatar Jun 23 '22 01:06 chrysan

Another finding is, cached regions are much more than real live regions: image image

This use case has many "truncate table". The eviction of cached regions could be enhanced.

chrysan avatar Jun 28 '22 03:06 chrysan

We may consider using skiplist as a replacement. Compared to btree, skiplist can have a smaller granularity of locks.

disksing avatar Jul 25 '22 04:07 disksing