badger icon indicating copy to clipboard operation
badger copied to clipboard

[Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression (PLR) to speed up int type key search for LSM-tree (bourbon idea)

Open liufuyang opened this issue 5 years ago • 0 comments

Disclaimer

Original implementation made by @spongedu at here https://github.com/spongedu/badger/tree/hackathon_go_plr_with_pointget_uint64key I ported over and did some clean up work.

Idea from paper: From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees, by Yifan Dai

Rust code source:

  • binary plr (which used in badger code) code here https://github.com/liufuyang/plr-test/blob/master/src/main.rs
  • plr library code here https://github.com/RyanMarcus/plr

Benchmark code here:

https://github.com/liufuyang/tbadger/blob/master/badger_test.go

With PLR, with pointGet

(fuyang-go-bench-badger *%) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:36:38 plr opening file: data/00000004.sst
2021/01/16 18:36:38 data/00000004.sst
2021/01/16 18:36:38 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.040000
-------> Hit rate: 0.042700
-------> Hit rate: 0.039912
-------> Hit rate: 0.039828
-------> Hit rate: 0.039901
40003507	       985 ns/op
PASS
ok  	tbadger	69.299s

WithoutPLR, with pointGet

(fuyang-go-bench-badger *%) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:40:05 plr opening file: data/00000004.sst
2021/01/16 18:40:05 data/00000004.sst
2021/01/16 18:40:05 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.060000
-------> Hit rate: 0.039500
-------> Hit rate: 0.039985
-------> Hit rate: 0.039894
-------> Hit rate: 0.039881
38633751	      1025 ns/op
PASS
ok  	tbadger	60.518s

WithPLR, without pointGet

(master %) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:52:33 plr opening file: data/00000004.sst
2021/01/16 18:52:33 data/00000004.sst
2021/01/16 18:52:33 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.030000
-------> Hit rate: 0.039600
-------> Hit rate: 0.039574
-------> Hit rate: 0.039863
-------> Hit rate: 0.039875
18224655	      1865 ns/op
PASS
ok  	tbadger	60.895s

WithoutPLR, without pointGet

(master %) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:50:51 plr opening file: data/00000004.sst
2021/01/16 18:50:51 data/00000004.sst
2021/01/16 18:50:51 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.030000
-------> Hit rate: 0.037500
-------> Hit rate: 0.039512
-------> Hit rate: 0.039923
14039182	      2240 ns/op
PASS
ok  	tbadger	34.574s

Summary:

  • With PLR, with pointGet 985 ns/op
  • W/ot PLR, with pointGet 1025 ns/op (Baseline)

Turn off badger index/pointGet to see the effect of PLR vs bare binary search

  • With PLR, w/ot pointGet 1865 ns/op
  • W/ot PLR, w/ot pointGet 2240 ns/op

liufuyang avatar Jan 16 '21 17:01 liufuyang