stellar-core icon indicating copy to clipboard operation
stellar-core copied to clipboard

Binary fuse filter

Open SirTyson opened this issue 1 year ago • 2 comments

Description

Addresses #4393.

This PR adds a Binary Fuse Filter library and switches BucketListDB to use the new filter.

The Binary Fuse Filter library is based on this library, but I ended up essentially rewriting it in C++. The original library had a bug where some filter bytes were non-deterministic. While this did not affect correctness or the false positive rate, we need determinism because State Archival will hash these filters into the ledger. Additionally, the library did not support 32 bit filters, which are required for State Archival.

Currently a draft until the (non protocol breaking) associated XDR changes are merged: https://github.com/stellar/stellar-xdr/pull/195

While Binary Fuse Filters are necessary for State Archival, they also significantly improve BucketListDB. Compared to the older bloom filter implementation, Bucket indexing time has decreased by 67% (which is blocking on startup in some cases), decreases index memory size by 20%, and reduces redundant disk reads by ~130x.

Checklist

  • [x] Reviewed the contributing document
  • [x] Rebased on top of master (no merge commits)
  • [x] Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • [x] Compiles
  • [x] Ran all tests
  • [ ] If change impacts performance, include supporting evidence per the performance document

SirTyson avatar Jul 18 '24 18:07 SirTyson

we'll have to put a note in the release notes that this will cause a reindexing of buckets on startup which will cause nodes to be offline for longer than normal

MonsieurNicolas avatar Jul 22 '24 17:07 MonsieurNicolas

I also was wondering if you considered a version that doesn't have the "populated" flag, but that returns an optional<> from the populate function, i.e. treat it as a sort of fallible constructor. I don't have a super strong preference for this -- it's not really idiomatic C++ the way it would be idiomatic Rust -- but I'm not thrilled with the 2-phase initialization and need to check-and-fail on the non-populated state. As far as I can tell, no client should ever have a reason to hold a non-populated filter, right? I wonder if there's a way to bake that further into the API.

I tried to enforce this with the [[nodiscard]] bool on the populate function. I agree the interface is not ideal, but because the caller constructing the filter may need to rotate the input seed, there is a "valid" reason that a caller may need to hold a non populated filter during construction while finding the correct seed. I'm not sure if an optional is right here. I think I'll just increase the max attempt count to a highly ridiculous number (as this is filter is part of consensus and construction should never fail) and throw in the constructor if it fails.

SirTyson avatar Jul 30 '24 23:07 SirTyson