Instances with many benchmarks are slow to process POST to /result/add/
In my system I have 400+ unique benchmarks. Every time a new result is POSTed, the application does a select against the codespeed_result table for every benchmark_id (over 400 select calls). It appears it is doing this to collect all results to update the codespeed_reports table.
This doesn't scale. As more and more benchmarks are added this is just going to get slower and slower.
One potential fix is to replace the 400+ selects with a single call to the DB to pull all of these results into memory at once for processing. This has some implications for overall memory footprint, but like anything it is a trade-off (space versus time).
You are not the first one to be bitten by this. One quick option is to disable the report creation/changes table caching.
A proper fix is more or less what you say. Memory shouldn't be a problem. Even 500 benchmarks each 20 results/revisions shouldn't really be a memory issue.
Is there a short-term patch I can apply to avoid all of that work? I took a look at views.py and see that it calls create_report_if_enough_data which appears to kick off all of those queries. Is it sufficient to just comment that out? Or will that break other things down the line?
FYI, each POST to /result/add/ on a "small" EC2 instance takes 9.5 seconds to complete. So, every time I do a benchmark run I produce about 420 results per iteration (JIT-on, JIT-off, etc). These 420 results take approximately 1 hour and 15 minutes to post to the database. :(
Disabling reports should not break anything else. They are also created on demand if you browse the changes view. Think, they are only useful to be created automatically for the landing page of codespeed.
One other option for you might be to post all the results at once via /result/add/json/.
Can you suggest which line I should comment out? I was looking at commenting out create_report_if_enough_data in the add_result method (both in views.py). Is that right?
Yes, commenting that call out in line 839 should do it.
I picked some low-hanging fruits for optimizing Report.get_changes_table generation. It's about 4 times faster now. It would be possible to further optimize it (e.g. get result average for all benchmarks at once), but that might decrease flexibility if one would like to do more analysis.
https://github.com/squiddy/codespeed/tree/report_query_optimization (not fully tested for correctness, but that's the idea).
My question: is this speed-up any help, or is it better to discuss what statistics are planned (@tobami you said you planned for more a lot more statistics) and work out how to do it in an efficient manner?
Of course they are useful!
Bear in mind that for PyPy's dataset it got so heavy that I introduced caching of the changes table. If it gets really efficient we can ditch that, because it introduces problems. For example selecting a different default trend value (admittedly seldom) will render the cached changes table out of date, with actually invalid trend data.
So I want to check how fast it would now be without caching.
Some results on PyPy data with all of the optimizations. It's possible I've misunderstood the code and it's creating wrong results, but I'll clean up the branch and then you can tell me.
from codespeed.models import Report
a = Report.objects.all().order_by('-revision__date')[0]
<Report: Report for Dec 08, 11:17 - 50316:8de6f245c959>
%timeit a.get_changes_table(trend_depth=11)
master: 1 loops, best of 3: 631 ms per loop optimized: 10 loops, best of 3: 22.5 ms per loop (cache hit: 1000 loops, best of 3: 439 us per loop)
I better leave the measuring up to you, but at least it looks promising. :-)
Cleaned up version: https://github.com/squiddy/codespeed/tree/report_query_optimization
That looks quite good!
What do the "cache hit" results really mean? Are they really only 0.439 ms? And why do you change the number of loops for each case? That does not make it apples to apples comparison. It would also be better avg or even worst of instead of best of
Cache hit (self._get_tablecache()) results probably mean nothing, I was just curious how it compares.
I didn't change the loop count myself, I was using the %timeit function of the IPython shell.