bolt compact increases the size of database instead of shrinking it
I've a small boltdb database so do not need compaction as such but tried bolt compact command out of curiousity. Following is the small boltdb file:
parallels@ubuntu$ ls -lh src.db
-rw-rw-r-- 1 parallels parallels 4.0M May 21 08:12 src.db
I ran bolt compact command to test shrinking:
parallels@ubuntu$ bolt compact -o dst.db src.db
4194304 -> 8388608 bytes (gain=0.50x)
As evident from the output of command and following output, it increases the size from 4MB to 8MB.
parallels@ubuntu$ ls -lh
total 13M
-rw-rw-r-- 1 parallels parallels 8.0M May 21 08:23 dst.db
-rw-rw-r-- 1 parallels parallels 4.0M May 21 08:12 src.db
following are the outputs of bolt stat:
parallels@ubuntu$ bolt stats src.db
Aggregate statistics for 4 buckets
Page count statistics
Number of logical branch pages: 25
Number of physical branch overflow pages: 0
Number of logical leaf pages: 915
Number of physical leaf overflow pages: 0
Tree statistics
Number of keys/value pairs: 34038
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 102400
Bytes actually used for branch data: 62442 (60%)
Bytes allocated for physical leaf pages: 3747840
Bytes actually used for leaf data: 2327107 (62%)
Bucket statistics
Total number of buckets: 4
Total number on inlined buckets: 0 (0%)
Bytes used for inlined buckets: 0 (0%)
parallels@ubuntu$ bolt stats dst.db
Aggregate statistics for 4 buckets
Page count statistics
Number of logical branch pages: 40
Number of physical branch overflow pages: 0
Number of logical leaf pages: 1157
Number of physical leaf overflow pages: 0
Tree statistics
Number of keys/value pairs: 34038
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 163840
Bytes actually used for branch data: 82448 (50%)
Bytes allocated for physical leaf pages: 4739072
Bytes actually used for leaf data: 2330979 (49%)
Bucket statistics
Total number of buckets: 4
Total number on inlined buckets: 0 (0%)
Bytes used for inlined buckets: 0 (0%)
Question: I was expecting that in worst case, the size would remain the same as original boltdb file since key-value pairs as well as number of buckets remain the same. Can someone help explain the behavior here - what is happening underneath?
Additional information: The src DB was written in two pass. Two buckets written in first pass and one bucket written in another pass.
(sorry - if there is a mailing list that I should have used rather).
When a B+tree page is filled, it splits it into two pages. By default, the split is 50-50 (bucket.FillPercent = 0.5). This is optimal for random inserts, but when inserting consecutively, the 50-50 split leaves behind a lot of half-full pages (see the 50% and 59% in the stats). A higher fill percent would reduce the wasted space, though too high would slow down future inserts as it needs to split full pages.
I see a fill-percent option in the code, but I think it's only used by the benchmark command. Maybe compact should support it too.