Speed up backup/restore by skipping empty regions
Feature Request
Describe your feature request related problem:
Speed up backup/restore by skipping empty regions. Some cluster may contain lots of empty as a result, there are many empty backup files which can be ignored during restore.
Here is an example.
➜ cat backupmeta.json | jq '.files | map(select(.cf == "write")) | length' | head
122333 # number of backuped regions
➜ cat backupmeta.json | jq '.files | map(select(."total_bytes" != null and .cf == "write")) | length' | head
17095 # number of non-empty regions
Describe the feature you'd like:
Speed up backup/restore by skipping empty regions. Also, we must ensure all regions' range must be continuous in a restored cluster.
Describe alternatives you've considered:
- [ ] 1. Add a subcommand in debug command to optimize backupmeta (filter empty files).
- [ ] 2. Automatic filtering empty files during restore (without modifying backupmeta).
- [x] 3. Do not generate backup files for empty regions during backup.
1 and 2 speed up restoring an existing backup. 3 speeds up future backup and restore.
Teachability, Documentation, Adoption, Migration Strategy:
why not just don't generate those empty SSTs in the first place
why not just don't generate those empty SSTs in the first place
Updated, thanks for your suggestion!
It seems br won't generate empty SSTs cause we had this check?
so where did the total_bytes = null SSTs come from 😂
It turns out total_bytes is only records once in write cf sst or in default cf sst. So there is actually no empty region but rather small regions.