simple-zstd V2 - buffer interface, dictionary support, child process queue

This version address to issue with the original release.

The lack of a buffer interface.
The lack of obvious dictionary support. While the original interface exposed all zstd options as an array argument, it was not obvious to users that this was capable. as most will have to reimplement the same tmp file interface, the package was extended to include more direct support.
A child process queue. When instantiating an instance of the SimpleZSTD class, the user of the package can request a child process queue. The class will spawn instances of ZSTD and hold them in waiting for the next request. This, in theory, should reduce the latency related to spacing the child process.

The interface of the package has changed hence the major version bump.

Aug 07 '22 20:08 Stieneee

This looks very good!

Great API and implementation from what I can tell.

Seems to work well for smaller sample sizes.

I am currently running into an error on Windows benchmarking this with large amounts of data (over 20,000 json objects):

(node:16320) UnhandledPromiseRejectionWarning: Error: EMFILE: too many open files, open 'C:\Users\user\AppData\Local\Temp\tmp-16320-OkDKhqWe4PZf'

I might come back later to this and try to figure out what's going on.

Perhaps too many child processes are spawned if I pass on no parameter for it?

EDIT: In any case, it might be a good idea to set a hard limit on the amount of spawned child processes, perhaps 32 or something. Not saying the error neccessarily did originate because of this.

Aug 08 '22 09:08 Awendel

@Awendel Wow, you found this before I could put together a response on the other thread.

I'm happy with how the implementation came together. The only thing that I would change is maybe the ordering or prams on the various functions. It seems clunky to have to specify several empty objects and an array just to specify the dictionary. Thoughts? Maybe turn the interface into a single object?

The tmp files are for dictionaries, are you using the SimpleZSTD class or one of the exported functions? If you are using the class, it may be a sign of a bug. My intention with the class is that the dictionary file should only be written once when the class is instantiated and reused for every process spawned by the class.

EDIT: In any case, it might be a good idea to set a hard limit on the amount of spawned child processes, perhaps 32 or something. Not saying the error necessarily did originate because of this.

Interesting idea, the fact that all interfaces are now promise-based, it would be possible to queue the requests. Should I do this for an instance of the class or package-wide?

Aug 09 '22 04:08 Stieneee

Yes, changing the optional Parameters to a single (optional) config object where each entry is also optional and has a sane default isa a great idea (in typescript this would be Partial<Type>). So the API would behave more like: compress(buffer, config) -> Promise

Regarding the bug / dictionaries: I see now the nature of the problem. I did indeed use the non class version to test. If it generates a file forEach invocation, that explains the error (since it would lead to too many file handlers). Later I will change it to the class version and test again.

Here are a few thoughts: (1) How exactly is the dictionary passed on to zstd and why does it have to be via file? Is there no way do this in memory? I only ask, because this would introduce substantial latency (if the zstd process has to perform a disk read before every invocation - the latency overhead just for that would automatically not make it in the same ballpark as for example Snappy). (EDIT: If the file has to be only generated / loaded from disk when the child process is created at first and its reused for subsequent compress calls, then of course this is not a problem anymore)

(2) If a file truly needs to be generated, then for the non class method, there is a way to not having to generate a tmp file forEach invocation: Using a fast hash function, generate a hash of the dictionary buffer. Store it in a lookup map alongside the path to the temporary file generated. At the next invocation, compute the hash again and it if matches a previous dictionary, just look up the path instead of generating a new tmp file from scratch.

(3) in the official zstd repo / website - it says there are 2 ways to use dictionaries: (a) as one off kind of API (slower if used often) and (b) a reusable API that parsed the dictionary once and just uses it on subsequent calls. Does the current (class) implementation make use if that second underlying API?

Regarding queues: This might be a good idea, but in any case it's probably good to do some benchmarks factoring the following: • latency • memory usage • stability (especially on low end hardware, e.g. if it crashed if too many child processes are launched etc)

Aug 09 '22 09:08 Awendel

I just rewrote my benchmark to use the Class interface ... and unfortunately it is not looking good so far. It seems to perform exponentially worse the more samples I test. Up to 100 json objects looks fine, but if I go to 1000 or 10,000 the latency goes up exponentially, as though there is some sort of bottleneck.

Would be good to debug exactly what goes on at each step and what makes the process stall. I am happy to assist further from a theoretical view point.

I guess one would have to look at all the stages the data travels through. From the buffer, passing it on the child process, to zstd, from stdout to node etc.

In any case for your own tests I recommend working with json objects around 1 - 3kb and then have at least 3000 of them or more in an array and compress them all / decompress them all and measure latency.

In my own test I have around 20,000 of those objects. For reference speed:

Snappy (npm snappy) does the each stage in around 200ms (which will be very hard to beat) and the so far most stable zstd package I tested (zstd-codec) does it in around 700ms per stage (which is slower than Snappy but so far by far the fastest zstd lib for node I could find.

I couldn't complete my benchmark with your new class interface, the highest I could do was reducing it to 1000 objects. But even then it already takes more than 30 seconds for each stage. So there definitely must be a bottleneck somewhere that hold performance back.

Aug 09 '22 10:08 Awendel