clusterfuzzlite icon indicating copy to clipboard operation
clusterfuzzlite copied to clipboard

There doesn't seem to be a way to exclude corpora from artifacts

Open evverx opened this issue 3 years ago • 7 comments

From https://github.com/google/oss-fuzz/pull/7186#issuecomment-1025000134

I think CFLite should pass UPLOAD_BUILD to build_fuzzers to make it possible to exclude those large corpora from artifacts. Without that environment variable I can't skip that step with something like

if [[ "$MERGE_WITH_OSS_FUZZ_CORPORA" == "yes" ]]; then

    # When the latest builds are uploaded by CFLite the large OSS-Fuzz corpora
    # should be excluded regardless of whether MERGE_WITH_OSS_FUZZ_CORPORA
    # is set to "yes" or not.
    [[ "$UPLOAD_BUILD" == "true" ]] && exit 0
    ....

evverx avatar Jan 30 '22 02:01 evverx

I'm not sure why corpora are included in those artifacts in the first place. I think it would probably make sense to always remove all the "*_seed_corpus.zip" files before uploading them.

evverx avatar Jan 30 '22 03:01 evverx

I'm not sure why corpora are included in those artifacts in the first place. I think it would probably make sense to always remove all the "*_seed_corpus.zip" files before uploading them.

How much does this matter to you? I don't want to add too many smart features like this since it will add too much complexity

jonathanmetzman avatar Feb 01 '22 15:02 jonathanmetzman

Those corpora take up about 400Mb (when they are compressed) and that's just too much I think. I can't upload huge artifacts like that on every commit (given that they are kept for 3 months by default).

evverx avatar Feb 01 '22 16:02 evverx

Those corpora take up about 400Mb (when they are compressed) and that's just too much I think. I can't upload huge artifacts like that on every commit (given that they are kept for 3 months by default).

I'm going to try to fix this retention policy issue. Lemme send a PR deleting the seed corpora.

jonathanmetzman avatar Feb 02 '22 20:02 jonathanmetzman

@jonathanmetzman on second thought given that the size of those corpora can be controlled by scripts running on PRs I don't think they should be even uploaded. It took some time and I kind of DOSed myself with the public OSS-Fuzz corpora accidentally but it's possible to just open a PR, replace "code_change" with "batch" and put giant files in "$OUT/" to somewhat speed up this process. I still have no idea why GitHub allows that with read-only tokens but it is what it is apparently.

evverx avatar Feb 09 '22 11:02 evverx

I think we can accomplish what you are asking for by allowing you to set FILESTORE=no_filestore (should be an easy fix) and by using https://github.com/google/clusterfuzzlite/pull/93

jonathanmetzman avatar Apr 01 '22 00:04 jonathanmetzman

I can't seem to wrap my head around it. I think if it was possible to set FILESTORE=no_filestore without #93 it should do as well. I think I'd need #93 if I used the "batch" mode but I use only the "code-chage" mode and additionally am planning to turn on continuous builds.

evverx avatar Apr 01 '22 05:04 evverx