iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Object Cache: caches parsed Manifests and ManifestLists for performance

Open sdd opened this issue 1 year ago • 3 comments

This builds on top of the concurrent scans PR and so needs to be merged after that.

It caches parsed instances of Manifest and ManifestList objects so that they are not re-fetched and re-parsed if the same object is required in a subsequent scan. Experiments on the test data in my perf testing branch have shown that this can reduce the time taken for plan_files to execute a second time from 650ms down to 5ms, even if this involved a different filter predicate.

The cache is an LRU cache implemented using the great moka crate. By default the cache size is 32Mb but it can be configured to use any size or be disabled entirely.

sdd avatar Jul 31 '24 01:07 sdd

cc @sdd Would you mind to update pr to resolve conflicts?

liurenjie1024 avatar Aug 09 '24 09:08 liurenjie1024

Sure, will do so in a few hours time

sdd avatar Aug 09 '24 11:08 sdd

Thanks for the review @Xuanwo, much appreciated! Back to you, I've addressed your comments.

sdd avatar Aug 09 '24 22:08 sdd

Security audit failure is tracked in #559 , I'll merge this first. Thanks @sdd !

liurenjie1024 avatar Aug 19 '24 09:08 liurenjie1024