IAI: Cpu profiler
This is yet another "for discussion" PR:
use atomic_lib::Storelike;
use iai::{black_box, Iai};
fn bench_empty(iai: &mut Iai) {
iai.run(|| {
let store = atomic_lib::Store::init().unwrap();
return;
});
}
fn bench_all_resources(iai: &mut Iai) {
iai.run(|| {
let store = atomic_lib::Store::init().unwrap();
store.all_resources(black_box(true)).len();
});
}
iai::main!(bench_empty, bench_all_resources);
This is an example of how to use IAI for CPU benchmarking - the setup code shall be measured separately and values subtracted from the result. The output
Running `target/debug/examples/iai_cpu_benchmark`
bench_empty
Instructions: 2398336 ( +inf%)
L1 Accesses: 4051806 ( +inf%)
L2 Accesses: 56734 (+1418250%)
RAM Accesses: 4048 ( +inf%)
Estimated Cycles: 4477156 (+22385680%)
bench_all_resources
Instructions: 3026321
L1 Accesses: 5078774
L2 Accesses: 63863
RAM Accesses: 5272
Estimated Cycles: 5582609
it's Linux only, due to dependency on valgrind(cachegrind), but it's simplest to use as far as I know. It uses fork https://github.com/reknih/iai of iai which I moved into my own repo.
Thanks for this!
Not entirely what the added benefit is on top of regular (milliseconds-based) benchmarking, though. Can you elaborate on that?
Its CPU cycles count with L1 and L2 cache. Ideally I would like to have cover of "overall time" - in ms/ns benchmark (criterion), which indicates user experience, memory usage (memory_profile via profile crates with jemalloc allocator) and CPU usage. From IAI author "For benchmarks that run in CI (especially if you're checking for performance regressions in pull requests on cloud CI) you should use Iai. For benchmarking on Windows or other platforms that Valgrind doesn't support, you should use Criterion-rs. For other cases, I would advise using both. Iai gives more precision and scales better to larger benchmarks, while Criterion-rs allows for excluding setup time and gives you more information about the actual time your code takes and how strongly that is affected by non-determinism like threading or hash-table randomization. If you absolutely need to pick one or the other though, Iai is probably the one to go with."
Can you rebase this onto master? And another request, can you add instructions for CPU profiling to contribute.md, if there are new instructions relevant?