Maybe the performance of the output buffer can be improved
Please see CPython issue41486.
It seems python-zstandard also has overhead of buffer resizing, but don’t know how big the impact is:
https://github.com/indygreg/python-zstandard/blob/7f0a35f4dc0b5a5317c504c4dc78f25792f8cd5f/zstd.c#L325-L344
The design of Python's compression APIs is not conducive to good performance as they require buffer reallocations/resizing. This is exactly why python-zstandard offers numerous APIs beyond what Python's standard library offers.
Could our standard-library compatible API performance be improved? Possibly. But the last time benchmarked it, it didn't seem to be a horrible problem. IMO if people care about performance they should use an API with better performance properties.
This code is very slow:
dctx = zstandard.ZstdDecompressor()
d = dctx.decompressobj()
t1 = time.perf_counter()
dat1 = d.decompress(dat)
t2 = time.perf_counter()
print(t2-t1)
Result, the first column is decompressed size, decompressed data is b'a' * (mb*1024*1024):
MB seconds
10 0.116
20 0.410
30 0.908
40 1.724
50 2.649
60 3.873
70 5.369
80 7.123
90 9.008
100 11.375
If eliminate the cost of resizing, it can reach this speed:
MB seconds
10 0.007
20 0.011
30 0.015
40 0.022
50 0.029
60 0.033
70 0.039
80 0.045
90 0.052
100 0.058
If you want to use pyzstd's output buffer code, feel free to use it. It is very simple in nature, only initialize, grow, finish, error-handle functions.