Run automatic benchmarks
We should port Brian's mechanism for automatically running benchmarks and come up with a couple of benchmarks, both micro-benchmarks and realistic model simulations (e.g. the classical CUBA, COBA sims). This should include comparing the different code generation targets and maybe comparisons to as a reference.
It would also be good to include memory usage in addition to speed, but I'm not sure how feasible that is.
This issue is not a high priority, though, first we need to make the simulations run, then we can worry about performance :)
The tool "airspeed velocity" (http://asv.readthedocs.io/en/latest/) seems to be what the cool kids are using nowadays (see e.g. https://pv.github.io/numpy-bench/).
Looks like a nice tool indeed.
I've setup a basic framework for benchmarking in the brian-team/brian2_benchmarks repository. It seems to work fine so far. I added a few "full" benchmarks (examples like CUBA and COBAHH, but nothing yet for multicompartmental modeling), and just an example for a "microbenchmark". I think it is good to have a few complex examples that track real-world performance and can show regressions/improvements that are not covered by a microbenchmark. On the other hand, we shouldn't add too many of them, as they generally take quite a while to execute. Also, microbenchmarks of course give a much clearer idea of where a performance improvement/regression is coming from.
I'll run a few example benchmarks and publish the results on a fancy github page (the asv tool takes care of that).
Would it be useful to have a dedicated (older) machine for this sort of thing?
In principle it could be useful, we could have it run the benchmarks for each and every commit. But I'm not 100% sure that it is worth the effort. Either way, the tool keeps track of the machine that was used for running the benchmarks -- it might be useful to at least run things on a Windows and a Linux machine, maybe?
Well I could probably find an unused older machine lying around here somewhere that could be dedicated to this task, but only if it's worth the effort and it sounds like maybe not?
Well I could probably find an unused older machine lying around here somewhere that could be dedicated to this task, but only if it's worth the effort and it sounds like maybe not?
We'd probably have a machine that could be used for this as well... From my experience, configuring a machine to continuously run these things has a bit of maintenance (not to speak of electricity ;) ) cost, there are updates, etc. I think at least for now I'd prepare to run benchmarks on my machine every now and then, e.g. when we plan to do a new release, similar to what we do for the examples.
Oh, here are the first benchmark results BTW: https://brian-team.github.io/brian2_benchmarks/ There are only four data points per benchmark, and for one of them I intentionally used the commit that introduced the large performance regression in StateMonitor. But it should already give an idea of what that thing looks like. Not sure what to make of the small variations (e.g. in COBAHH), but this should be clearer with more samples, I guess. I'll run more simulations over the weekend.
Agreed! Benchmarks system looks great. Can it do memory performance too?
Can it do memory performance too?
Yes, it can measure the size of Python objects (less useful for us I guess), and the approximate peak memory usage during a test. One of the uploaded benchmarks actually measures the peak memory for the StateMonitor "micro benchmark": https://brian-team.github.io/brian2_benchmarks/#benchmarks.Microbenchmarks.peakmem_statemonitor
(Explanation: you simply call a function time_... to measure the runtime or peakmem_... to measure the peak memory use -- couldn't be much simpler than that)
Very nice indeed!