LinuxPerf.jl icon indicating copy to clipboard operation
LinuxPerf.jl copied to clipboard

`reset!` causes incorrect scaling for `Stats`

Open topolarity opened this issue 1 year ago • 1 comments

julia> bench = LinuxPerf.make_bench();
julia> enable!(bench); sleep(1.0); disable!(bench);
julia> stats = LinuxPerf.ThreadStats(bench);
julia> stats.groups[1][1]
LinuxPerf.Counter(hw:cycles, 0x0000000001bdc0f6, 0x0000000000a08a80, 0x00000000007a6ed7)

julia> reset!(bench)
julia> stats = LinuxPerf.ThreadStats(bench);
julia> stats.groups[1][1]
LinuxPerf.Counter(hw:cycles, 0x0000000000000000, 0x0000000003dad501, 0x00000000026bd3ce)

Notice that IOC_RESET resets the value of the counter, but doesn't affect time_enabled or time_running

It's hard to say what the correct behavior is here, but this means that the Counter object is no longer valid because its running / enabled ratio is not what it thinks it is:

julia> stats
╶ cpu-cycles               0.00e+00   76.3%  #  0.0 cycles per ns
┌ cache-references         0.00e+00   62.0%
└ cache-misses             0.00e+00   62.0%  #  NaN% of cache refs
┌ branch-instructions      0.00e+00   61.8%  #  NaN% of insns
│ branch-misses            0.00e+00   61.8%  #  NaN% of branch insns
└ instructions             0.00e+00   61.8%  #  NaN insns per cycle
┌ context-switches         0.00e+00  100.0%
│ page-faults              0.00e+00  100.0%
│ minor-faults             0.00e+00  100.0%
│ major-faults             0.00e+00  100.0%
└ cpu-migrations           0.00e+00  100.0%

If you try to reset in a loop, this means you end up scaling the measurement by its average running time instead of its true running time for the latest sample.

topolarity avatar Dec 20 '24 01:12 topolarity

Looks like we can't change this behaviour and the package only uses it to reset the counter,

PERF_EVENT_IOC_RESET
              Reset the event count specified by the file descriptor
              argument to zero.  This resets only the counts; there is
              no way to reset the multiplexing time_enabled or
              time_running values.

I am slightly concerned that we use this right after opening an event, does the counter not start at 0? If not, is the same true for the time running and time enabled fields.

Zentrik avatar Dec 20 '24 09:12 Zentrik