br icon indicating copy to clipboard operation
br copied to clipboard

Restore fails due to checksum timeout due to a huge region (size ~ 400 GB)

Open overvenus opened this issue 5 years ago • 0 comments

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.

Restore a 2TB tpcc dataset.

br restore full --s3.endpoint http://xxx:9000 -s "s3://tpcc/br-2t
  1. What did you expect to see?

Restore success.

  1. What did you see instead?

Restore fails due to checksum timeout.

  1. What version of BR and TiDB/TiKV/PD are you using?

v5.0.0-rc

From BR log, data is restored successful, but checksum times out,

  1. Checksum times out on a huge regoin (about 400GB).
  2. From TiKV log, we find the huge region belongs to table stock (table id 54, key prefix 7480000000000000FF36).
  3. From BR log, we find br splits zero regions for 33 times (which should not happen)

br.log: https://drive.google.com/file/d/1iD0TOOhHrai7nEyTKw6F27TaaohoKgZp/view?usp=sharing tikv.log: https://drive.google.com/file/d/19iRzPoowGxZS47dO1b6GYNijRwkfXPfJ/view?usp=sharing

overvenus avatar Jan 21 '21 05:01 overvenus