Luca Cappelletti
Luca Cappelletti
I am running several benchmarks, I can add also that one in. Thanks for pointing that one out.
> Hi @LucaCappelletti94, > > Thanks for letting me know! Did you benchmarked against the MLE and improved estimator in Ertl's paper, as implemented in SourMash (https://docs.rs/sourmash/0.15.0/sourmash/sketch/hyperloglog/estimators/index.html)? This is a...
> And also this one, hypertwobits(https://github.com/axiomhq/hypertwobits/tree/main), a very new one, paper was published just last month. I am a little bit concerned about it since the benchmarks showed very large...
> I would also be interested in UltraLogLog (https://dl.acm.org/doi/abs/10.14778/3654621.3654632) and ExaLogLog (https://arxiv.org/abs/2402.13726). But no implementation is available in Rust. All Ertl's implementation is in the hash4j java library, which I...
Now I have also added support for serde.
And now added const generic, similarly to what you are doing [in this pull request](https://github.com/compenguy/ngrammatic/pull/5). Updated also all tests, which are all passing.
I am trying to understand which areas that require a non-trivial amount of memory may be trimmed a bit. For instance, I believe there is some redundancy in the hashmap...
@compenguy if you are curious, you can take a look at the ongoing benchmarks here: https://github.com/LucaCappelletti94/ngrammatic/tree/master/benchmarks
We are now down from the initial `7.875 GB` to `439.6 MB`! Still a lot of space at the bottom.
I am first focusing on getting the file size down to something manageable. Also the queries are now much faster, but I still have to do a comprehensive benchmark for...