action icon indicating copy to clipboard operation
action copied to clipboard

Support large test suites

Open ben-manes opened this issue 3 years ago • 4 comments

My build runs ~9M tests per matrix item, which results in very large junit test results. When experimenting, this action failed due heap exhaustion.

I am currently using publish-unit-test-result-action which had this issue. Prior to their fix I used some bash to truncate with,

find . -type f -name '*.xml' -exec sh -c 'grep testsuite {} > {}.out && mv {}.out {}' \;

ben-manes avatar May 17 '22 02:05 ben-manes

That trick didn’t work, unfortunately. The parser did not take the summary line and instead found no tests.

ben-manes avatar May 17 '22 03:05 ben-manes

Yikes. Thanks for opening the issue @ben-manes. The parser that I used is pretty naive, it was a quick & dirty attempt. I can switch to a streaming parser and that should be able to parse very large XML documents.

There's still an opportunity to exhaust the heap if you had (say) 9M test failures, each with a very long stack trace or something. We should be thoughtful of that as well and put a cap on the number of failures that we collect for showing the output table and truncate beyond that.

As for why your truncation didn't work, I was examining each testcase since my understanding is that failures and skipped are optional in the XML. However, rethinking this, I suspect that what they mean is that the failure property is optional and if omitted means that there are no failures. I thought that I had found a reporter that would omit the failures property, but now I'm suspicious of my memory. I'll go through this again. If every reporter in common usage always shows a failures property on a testsuite when there are actually failures, then we don't need to bother with testcases unless you want more detailed output. So a mode that doesn't show the test cases table would need not parse testcases at all. This would be a quick & dirty solution that would make your truncation work.

ethomson avatar May 17 '22 13:05 ethomson

I'm experimenting with a matrix myself and it seems to work fine: https://github.com/crazy-max/buildkit/blob/dc1335ba667df6977e9b02440566c77b297632c7/.github/workflows/build.yml#L224-L275

Result: https://github.com/moby/buildkit/actions/runs/2498837572#summary-6895289837

image

I'm also using another tool called teststat to have other useful info. Might be cool to have it here too.

crazy-max avatar Jun 15 '22 13:06 crazy-max

We're having the same issue with some builds triggering resource exhaustion due to very large matrix. See for example this run: https://github.com/camunda/zeebe/actions/runs/7895500367/job/21547941014

Couple things that could factor: test result files for these tend to be huge (like ~350MB), but generally the number of tests is low-ish (like < 1k).

npepinpe avatar Feb 14 '24 10:02 npepinpe