Investigate CI performance
Problem Definition
In the last couple of months our CI performance has noticeably decreased. The current average workflow runtime seems to be 20+ minutes.
Ideally we want to return to the the 10 - 15 min range.
Things to investigate:
- Where is time being spent?
- Workspace up-/download
- This seems relatively slow esp. compared to cache
- Maybe move some files (e.g. python venv) to cache and only keep config / setting in workspace
- Caching (is it working as intended?)
- Workspace up-/download
- Are we being throttled by concurrency limits? (Conversation with Circle @karlb @czepluch)
- Can we improve test run-time by further parallelization? (diminishing returns due to spin up overhead)
- Can the integration tests be optimized?
E.g.:
- Possibility to not restart eth node / synapse / etc. for every test
- Do the contracts have to be deployed for every test?
I can setup a meeting with CircleCI if we believe that would be helpful.
Are we being throttled by concurrency limits? (Conversation with Circle @karlb @czepluch)
Yes, we are. I have been contacted by a CircleCI employee who would like to talk to us about it. I'll forward the contact details to both of you, again.
Today we mentioned a possibility of bumping up the required Python version to 3.8 (@palango, @ezdac). If we decide to go that way, we can simply remove all the python-3.7 stuff from our CI, which I guess would help somewhat.