atomic-server icon indicating copy to clipboard operation
atomic-server copied to clipboard

IAI: Cpu profiler

Open AlexMikhalev opened this issue 3 years ago • 3 comments

This is yet another "for discussion" PR:

use atomic_lib::Storelike;

use iai::{black_box, Iai};

fn bench_empty(iai: &mut Iai) {
    iai.run(|| {
        let store = atomic_lib::Store::init().unwrap();
        return;
    });
}

fn bench_all_resources(iai: &mut Iai) {
    iai.run(|| {
        let store = atomic_lib::Store::init().unwrap();
        store.all_resources(black_box(true)).len();
    });
}

iai::main!(bench_empty, bench_all_resources);

This is an example of how to use IAI for CPU benchmarking - the setup code shall be measured separately and values subtracted from the result. The output

 Running `target/debug/examples/iai_cpu_benchmark`
bench_empty
  Instructions:             2398336 (  +inf%)
  L1 Accesses:              4051806 (  +inf%)
  L2 Accesses:                56734 (+1418250%)
  RAM Accesses:                4048 (  +inf%)
  Estimated Cycles:         4477156 (+22385680%)

bench_all_resources
  Instructions:             3026321
  L1 Accesses:              5078774
  L2 Accesses:                63863
  RAM Accesses:                5272
  Estimated Cycles:         5582609

it's Linux only, due to dependency on valgrind(cachegrind), but it's simplest to use as far as I know. It uses fork https://github.com/reknih/iai of iai which I moved into my own repo.

AlexMikhalev avatar Apr 21 '22 14:04 AlexMikhalev

Thanks for this!

Not entirely what the added benefit is on top of regular (milliseconds-based) benchmarking, though. Can you elaborate on that?

joepio avatar Apr 21 '22 14:04 joepio

Its CPU cycles count with L1 and L2 cache. Ideally I would like to have cover of "overall time" - in ms/ns benchmark (criterion), which indicates user experience, memory usage (memory_profile via profile crates with jemalloc allocator) and CPU usage. From IAI author "For benchmarks that run in CI (especially if you're checking for performance regressions in pull requests on cloud CI) you should use Iai. For benchmarking on Windows or other platforms that Valgrind doesn't support, you should use Criterion-rs. For other cases, I would advise using both. Iai gives more precision and scales better to larger benchmarks, while Criterion-rs allows for excluding setup time and gives you more information about the actual time your code takes and how strongly that is affected by non-determinism like threading or hash-table randomization. If you absolutely need to pick one or the other though, Iai is probably the one to go with."

AlexMikhalev avatar Apr 21 '22 14:04 AlexMikhalev

Can you rebase this onto master? And another request, can you add instructions for CPU profiling to contribute.md, if there are new instructions relevant?

joepio avatar Apr 27 '22 09:04 joepio