rsync icon indicating copy to clipboard operation
rsync copied to clipboard

Saving/reusing zstd dictionary

Open colinxs opened this issue 4 years ago • 0 comments

zstd supports creating a dictionary from a set of files which can then be used to speed up/increase the compression ratio on subsequent compressions. Looking at the code (and please correct me if I'm wrong here) in token.c and match.c, it appears that the that a new dictionary is used for each file. Naively, it seems like that dictionary could be shared for all files in the tree. Based on some benchmarking with a large set of TOML files, the performance increase is significant when using a dictionary.

Here is a discussion of this idea: https://unix.stackexchange.com/questions/553111/is-the-rsync-block-compression-dictionary-reset-for-each-file

As an extension to reusing the dictionary across files within a single call to rsync, a user could (optionally) provide an external dictionary or reuse one that rsync generates (similar to Batch Mode and --write-batch/--read-batch).

Both of these things would help significantly for something like real-time sync (as lsyncd, which is basically inotify + rsync, does) where I'm continuously rsyncing a file tree across a network using compression.

colinxs avatar May 02 '21 19:05 colinxs