LinuxPerf.jl
LinuxPerf.jl copied to clipboard
Use prctl to enable/ disable perf for lower overhead
prctl will enable or disable all benches, so you need to close them after you're done with them otherwise you'll quickly have too many benches and you won't get any results out of your new ones (not sure if the old ones will give results).
So I added close as an export, enable!, disable! are still useful so I haven't removed them.
Current
julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles 3.94e+03 100.0% # 0.0 cycles per ns
│ stalled-cycles-frontend 9.30e+02 100.0% # 23.6% of cycles
└ stalled-cycles-backend 3.73e+02 100.0% # 9.5% of cycles
┌ instructions 1.13e+03 100.0% # 0.3 insns per cycle
│ branch-instructions 2.46e+02 100.0% # 21.8% of insns
└ branch-misses 7.70e+01 100.0% # 31.3% of branch insns
┌ task-clock 1.20e+05 100.0% # 120.0 μs
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
New
julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles 1.22e+02 100.0% # 0.0 cycles per ns
│ stalled-cycles-frontend 3.00e+01 100.0% # 24.6% of cycles
└ stalled-cycles-backend 2.00e+00 100.0% # 1.6% of cycles
┌ instructions 1.50e+01 100.0% # 0.1 insns per cycle
│ branch-instructions 4.00e+00 100.0% # 26.7% of insns
└ branch-misses 3.00e+00 100.0% # 75.0% of branch insns
┌ task-clock 1.09e+05 100.0% # 109.0 μs
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The overhead could be lowered further by bypassing libc entirely but I don't think there would be a way to make this crossplatform in julia:
enable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 32)
ret i32 %a
""", Int32, Tuple{})
disable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 31)
ret i32 %a
""", Int32, Tuple{})
which gives
julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles 3.30e+01 100.0% # 0.0 cycles per ns
│ stalled-cycles-frontend 2.00e+00 100.0% # 6.1% of cycles
└ stalled-cycles-backend 1.70e+01 100.0% # 51.5% of cycles
┌ instructions 3.00e+00 100.0% # 0.1 insns per cycle
│ branch-instructions 1.00e+00 100.0% # 33.3% of insns
└ branch-misses 1.00e+00 100.0% # 100.0% of branch insns
┌ task-clock 1.01e+05 100.0% # 100.5 μs
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
I forgot to check if prctl returns an error code, I'll add that in a bit.
Any objections to me merging this and tagging a new release?