Precompute compressed lenghts of the training data
Hi,
I came across this implementation. I had an idea to speed up the computations. I don't expect you to merge it.
By pre-computing and storing the compressed lengths of the training data, one deflate call can be avoided in ncd function. I've observed ~33% performance increase.
Great project.
Thanks.
What about precomputing the compressed lengths of the test data while keeping the original text around (same for the training data) as well? (Possibly pouring some threads for that.) So, the only final computation will happen in the combined. I'm not too familiar with C, but I used a similar albeit naive approach in Kotlin, which is pathetically slow.
[edit] I poured actual threads, got ~5secs per test sample (still slow for me) using my suggestion. will try SIMD next