[BUILD] Try to fasten the build on CI
Please document time gains (before/after). Also keep in mind that build time gains shall not decrease build stability...
Not sure how it's supposed to fasten the build either? And I agree, our stability is already not great...
The idea here is: you can use parallel=methods: classes are executed in forkCount concurrent processes, each of the processes can then use threadCount threads to execute the methods of one class in parallel.
I also agree that if this experiment adds more instability to the build, we should not add it.
Yeah I'm really not sure doing more things in parallel make the build more stable^^' Can test though
threadCount= 2:
- before: 4h 48m
- after: 4h 31m
Not much diff, could just be because of the ci workload state. Will try with another value.
The build already uses T1C meaning that it attempts to parallelize the build at module level trying to build one module per hyperthreaded core. for each module build there is already parallelization of the tests on the class level (though I don't remember the exact threadcount) it is hard to tell without more precise metrics (not sure we could get them on such an infra) but trying to parallelize further at the method level is unlikely to yield massive improvments :
- many of james tests are CPU bound not IO bound (io bound tests would be integration tests that start databases)
- we already have massive parallelization in place (possibly even too much)
adding more parallelization could result in more context switching resulting in slower test execution for CPU bound tests
unfortunately I am not aware of a way to better optimize parallelization with maven. As far as i know, maven and surefire don't communicate at all. with gradle builds you can define an overall limit to workers and the scheduling is handled by the gradle daemon so you can maximize parallelization of both tests and compile tasks. But james doesn't build with gradle :)
A large issue for james is that the build is very naive and rebuilds a lot of useless stuff for every commit very few commits actually require rebuilding the whole codebase ... not sure if we can improve that
there may be ideas to find in https://eng.uber.com/how-we-halved-go-monorepo-ci-build-time/ they use a different build tool but the overall problem is the same
Hi Jean, thank you for your suggestion.
The build already uses T1C meaning that it attempts to parallelize the build at module level trying to build one module per hyperthreaded core. for each module build there is already parallelization of the tests on the class level (though I don't remember the exact threadcount)
I checked our Jenkinsfile. We just use T1C for the compilation, not for the test phase. Maybe adding T1C to the maven test command could improve the testing time, but I guess the rationale why we did not use that before could have been because of resource fairness between the build...
02:56:59.515 [ERROR] 🐳 [cassandra_3_11_10-597500a2-330a-4e34-a007-dd1c815acc50] - failed to get digest sha256:07aee99ea52e391d5a13faf4f24a5053814c1b6bad8e01fd29c5ceeb5e285080: open /var/lib/docker/image/overlay2/imagedb/content/sha256/07aee99ea52e391d5a13faf4f24a5053814c1b6bad8e01fd29c5ceeb5e285080: no such file or directory
Keep having this recently. This is likely a problem with docker image cache storage. Not sure if it's related to the new concurrency or not.
- Solution 1: clean the docker image storage (a bit dangerous cause we don't know what are being used on CI worker)
- Solution 2: build image with
--no-cacheoption. However could not find it in test container.
This work looks stale.
@quantranhong1999 can you please summarize your appoach (parallel execution of maven projects instead of relying on surefire fork counts, if I followed well) and its achievment both in term of build time and stability.
This is important as we can use that to see if it is a direction we should keep investigating...
Can you please summarize your appoach (parallel execution of maven projects instead of relying on surefire fork counts, if I followed well) and its achievment both in term of build time and stability.
My idea is to increase the parallelism by adding parallel executions in Maven parallel builds (this works with surefire fork counts, not replace it). Maven surefire plugin doc (https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html#forked-test-execution):
You can use the place holder ${surefire.forkNumber} within argLine, or within the system properties (both those specified via mvn test -D... and via systemPropertyVariables). Before executing the tests, the surefire plugin replaces that place holder by the number of the actually executing process, counting from 1 to the effective value of forkCount times the maximum number of parallel executions in Maven parallel builds, i.e. the effective value of the -T command line argument of Maven core.
So this idea will actually create more forks. E.g before: forkCount = 1 => 1 fork after: mvn test -T 8 ... with fork count = 1 => 8 forks.
Achievement until now:
- Build time: decrease much (stable test before 4h -> after ~ 1.5h)
- Stability: sadly decreased. It creates more unstable tests, especially some docker tests (LDAP as I observed).
Our current test suite is likely not sustainable that much concurrency. I did try to reduce concurrency by using fewer CI worker CPU cores but still did not archive a green build.
Specifying threadCount = 1 on docker tests module may help, but I am not sure it will works.
Any plan to carry on this work?
Should we close this, @quantranhong1999 ?
Any plan to carry on this work?
Likely no, I think adding more parallelism to the CI is not the right way but brings more instability...
Should we close this, @quantranhong1999 ?
Yes, I will close this.