Loading corpus inputs from a (compressed) archive.

Open zhenyudg opened this issue 3 years ago • 1 comments

As a result of increased fuzzing in our organization, we now have thousands of fuzz corpus files (admittedly a good problem to have :) Currently, we individually store all of these corpus files and pass them to cc_fuzz_test's corpus = glob(["my-corpus-directory/*"]).

Given the proliferation of corpus files, we are interested in storing fuzz corpora in compressed archives (say, as a single corpus.tar.gz for every cc_fuzz_test). Are there existing Bazel rules that can help us feed a compressed archive to cc_fuzz_test's corpus parameter? Alternatively, can we extend rules_fuzzing to support, say, corpus_archives = ["corpus.tar.gz"]?

Nov 10 '22 20:11 zhenyudg

I'm not 100% sure, but I believe you should be able to define a custom rule that extracts a .tar.gz archive and produces a directory output (so all the extracted files would be written there). Then you can instantiate the rule and use it as a corpus attribute in the fuzz target.

There is also the alternative of extracting the archive as a repository rule, documented here: https://stackoverflow.com/questions/46326749/how-do-i-unzip-a-file-in-bazel-properly-if-i-dont-know-the-contents-of-the-zip

But I can also see merit in supporting corpus archives natively, e.g., through a corpus_archive attribute, so this is a reasonable feature request (PRs welcome, too! 😁 ).

Nov 12 '22 23:11 stefanbucur