LinuxPerf.jl icon indicating copy to clipboard operation
LinuxPerf.jl copied to clipboard

Use prctl to enable/ disable perf for lower overhead

Open Zentrik opened this issue 1 year ago • 2 comments

prctl will enable or disable all benches, so you need to close them after you're done with them otherwise you'll quickly have too many benches and you won't get any results out of your new ones (not sure if the old ones will give results). So I added close as an export, enable!, disable! are still useful so I haven't removed them.

Current

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               3.94e+03  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  9.30e+02  100.0%  # 23.6% of cycles
└ stalled-cycles-backend   3.73e+02  100.0%  #  9.5% of cycles
┌ instructions             1.13e+03  100.0%  #  0.3 insns per cycle
│ branch-instructions      2.46e+02  100.0%  # 21.8% of insns
└ branch-misses            7.70e+01  100.0%  # 31.3% of branch insns
┌ task-clock               1.20e+05  100.0%  # 120.0 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

New

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               1.22e+02  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  3.00e+01  100.0%  # 24.6% of cycles
└ stalled-cycles-backend   2.00e+00  100.0%  #  1.6% of cycles
┌ instructions             1.50e+01  100.0%  #  0.1 insns per cycle
│ branch-instructions      4.00e+00  100.0%  # 26.7% of insns
└ branch-misses            3.00e+00  100.0%  # 75.0% of branch insns
┌ task-clock               1.09e+05  100.0%  # 109.0 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The overhead could be lowered further by bypassing libc entirely but I don't think there would be a way to make this crossplatform in julia:

enable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 32)
ret i32 %a
""", Int32, Tuple{})
disable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 31)
ret i32 %a
""", Int32, Tuple{})

which gives

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               3.30e+01  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  2.00e+00  100.0%  #  6.1% of cycles
└ stalled-cycles-backend   1.70e+01  100.0%  # 51.5% of cycles
┌ instructions             3.00e+00  100.0%  #  0.1 insns per cycle
│ branch-instructions      1.00e+00  100.0%  # 33.3% of insns
└ branch-misses            1.00e+00  100.0%  # 100.0% of branch insns
┌ task-clock               1.01e+05  100.0%  # 100.5 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Zentrik avatar Apr 29 '24 13:04 Zentrik

I forgot to check if prctl returns an error code, I'll add that in a bit.

Zentrik avatar Apr 29 '24 14:04 Zentrik

Any objections to me merging this and tagging a new release?

Zentrik avatar May 08 '24 12:05 Zentrik