bolt icon indicating copy to clipboard operation
bolt copied to clipboard

bolt compact increases the size of database instead of shrinking it

Open awmanoj opened this issue 8 years ago • 1 comments

I've a small boltdb database so do not need compaction as such but tried bolt compact command out of curiousity. Following is the small boltdb file:

parallels@ubuntu$ ls -lh src.db 
-rw-rw-r-- 1 parallels parallels 4.0M May 21 08:12 src.db

I ran bolt compact command to test shrinking:

parallels@ubuntu$ bolt compact -o dst.db src.db
4194304 -> 8388608 bytes (gain=0.50x)

As evident from the output of command and following output, it increases the size from 4MB to 8MB.

parallels@ubuntu$ ls -lh
total 13M
-rw-rw-r-- 1 parallels parallels 8.0M May 21 08:23 dst.db
-rw-rw-r-- 1 parallels parallels 4.0M May 21 08:12 src.db

following are the outputs of bolt stat:

parallels@ubuntu$ bolt stats src.db 
Aggregate statistics for 4 buckets

Page count statistics
	Number of logical branch pages: 25
	Number of physical branch overflow pages: 0
	Number of logical leaf pages: 915
	Number of physical leaf overflow pages: 0
Tree statistics
	Number of keys/value pairs: 34038
	Number of levels in B+tree: 3
Page size utilization
	Bytes allocated for physical branch pages: 102400
	Bytes actually used for branch data: 62442 (60%)
	Bytes allocated for physical leaf pages: 3747840
	Bytes actually used for leaf data: 2327107 (62%)
Bucket statistics
	Total number of buckets: 4
	Total number on inlined buckets: 0 (0%)
	Bytes used for inlined buckets: 0 (0%)
parallels@ubuntu$ bolt stats dst.db 
Aggregate statistics for 4 buckets

Page count statistics
	Number of logical branch pages: 40
	Number of physical branch overflow pages: 0
	Number of logical leaf pages: 1157
	Number of physical leaf overflow pages: 0
Tree statistics
	Number of keys/value pairs: 34038
	Number of levels in B+tree: 3
Page size utilization
	Bytes allocated for physical branch pages: 163840
	Bytes actually used for branch data: 82448 (50%)
	Bytes allocated for physical leaf pages: 4739072
	Bytes actually used for leaf data: 2330979 (49%)
Bucket statistics
	Total number of buckets: 4
	Total number on inlined buckets: 0 (0%)
	Bytes used for inlined buckets: 0 (0%)

Question: I was expecting that in worst case, the size would remain the same as original boltdb file since key-value pairs as well as number of buckets remain the same. Can someone help explain the behavior here - what is happening underneath?

Additional information: The src DB was written in two pass. Two buckets written in first pass and one bucket written in another pass.

(sorry - if there is a mailing list that I should have used rather).

awmanoj avatar May 21 '17 01:05 awmanoj

When a B+tree page is filled, it splits it into two pages. By default, the split is 50-50 (bucket.FillPercent = 0.5). This is optimal for random inserts, but when inserting consecutively, the 50-50 split leaves behind a lot of half-full pages (see the 50% and 59% in the stats). A higher fill percent would reduce the wasted space, though too high would slow down future inserts as it needs to split full pages.

I see a fill-percent option in the code, but I think it's only used by the benchmark command. Maybe compact should support it too.

dtfinch avatar May 23 '17 19:05 dtfinch