Improve Xpress and Xpress Huffman Compression Speed
These are currently about one-quarter the speed of the Rtl versions. They use the same dictionary which is the bottleneck.
Additionally, decompression code could be improved slightly (currently just slightly slower then the Rtl versions).
Has been improved with improvements to the dictionary, namely having a cutoff for number of chains to follow and good-enough lengths to stop after. But it isn't quite enough yet.
For Xpress it was a major improvement - we are up to ~82% of the speed of RTL (up from 26%). The compression ratio is just marginally better than RTL. However, by adjusting the compressor "level" (compile-time setting) we can get running faster than RTL (but slightly worse compression ratio). Here is the table of levels: (note that these don't use PGO so the speeds are slightly worse)
| Level | NiceLength | MaxChain | Speed (% of RTL) | CR (RTL is 40.009%) |
|---|---|---|---|---|
| 3 (default) | 48 | 11 | 77% | 40.007% |
| 2 | 32 | 8 | 86% | 40.428% |
| 1 | 16 | 4 | 112% | 41.655% |
By level 1 this implementation is much faster than RTL but the CR is worse (+1.6pp, or 4.4 MB of 269 MB).
However, it seems more likely that with a few other small improvements along with PGO, level 2 could almost be good enough and is a much smaller sacrifice in CR.
First off: awesome.
Would it be difficult to make the level a run-time setting? I'm planning on updating my benchmark soon (probably start over the weekend, but it will take all week on some of the less powerful computers), and if it's a run-time setting I can include all three levels. Besides, I would love to expose the options in Squash instead of choosing for the people consuming my API.
Eventually, yes. However I am still tweaking a lot of this (technically internally I have 8 levels, just changing NiceLength and MaxChain - equivalent concepts to the zlib library).
I am working on adding some of the other features that zlib has first. Also, I am going to try to run through a whole bunch of combinations of NiceLength/MaxChain (not just the ones I picked "out of a hat").
The question is then, how does an end user specify these settings? Xpress and Xpress Huffman will have these settings while LZNT1 will not have them.
The main goal of this library is to reproduce RTL as much as possible. RTL only has two levels (MAXIMUM and STANDARD, and technically HYBRID but that doesn't work).
I guess the "level" can be added as part of the format like RTL adds MAXIMUM and STANDARD:
const unsigned MSCOMP_XPRESS_DEFAULT_LEVEL = 0 << 4;
const unsigned MSCOMP_XPRESS_LEVEL_1 = 1 << 4;
const unsigned MSCOMP_XPRESS_LEVEL_2 = 2 << 4;
const unsigned MSCOMP_XPRESS_LEVEL_3 = 3 << 4;
ms_deflate_init(MSCOMP_XPRESS | MSCOMP_XPRESS_LEVEL_1, ...)
This way a different format can put something else in the top 28 bits of the format. One could even find a way to specify the NiceLength/MaxChain values in the format integer.
Another improvement in speed, this time mainly in Xpress Huffman.