tispark icon indicating copy to clipboard operation
tispark copied to clipboard

Support follower read

Open shiyuhang0 opened this issue 3 years ago • 1 comments

Problem

Now, TiSpark only supports reading data from the TiKV leader which may affect OLTP.

In order to isolate the traffic of OLAP and OLTP, we need to read from the follower TiKV when we perform OLTP with TiSpark.

Goals

Support follower read

Solutions

We will add the following configs:

  • spark.tispark.replica_read:Read data from specified role. The optional roles are leader, follower and learner. You can also specify multiple roles, and we will pick the roles you specify in order.
  • spark.tispark.replica_read.label:Only select TiKV store match specified labels. Format: label_x=value_x,label_y=value_y
  • spark.tispark..replica_read.address_whitelist:Only select TiKV store with given IP addresses.
  • spark.tispark..replica_read.address_blacklist:Do not select TiKV store with given IP addresses.

Here are some references for my investigate:

  • TiDB has supported the follower read with strongly consistent reads, so it is possible to support it in TiSpark too.
  • client-java has added the replica selector in https://github.com/tikv/client-java/pull/151 and https://github.com/tikv/client-java/pull/171 so that it has the ability to select roles of TiKV.

And here is the main step of the implement

  1. Pass and get the above configs
  2. Add a new class ReplicaReadPolicy which implements the ReplicaSelector interface in client-java
  3. Overrides the select method and applies the label, whitelist and blacklist configs
  4. Pass the ReplicaReadPolicy to TiConfiguration
  5. Create TiSession with the TiConfiguration

Design

https://github.com/pingcap/tispark/pull/2546

Implementation

https://github.com/pingcap/tispark/pull/2546

Doc

https://github.com/pingcap/tispark/pull/2546

shiyuhang0 avatar Sep 05 '22 04:09 shiyuhang0

Cool.

ngaut avatar Sep 05 '22 09:09 ngaut