Faster next / finish by using `rb_profile_frames`
Description
- By using rb_profile_frames instead of
rb_make_backtracein DEBUGGER__.frame_depth, we can improve the performance of frame_depth substantially for common cases. This leads to much fasternextandfinishperformance. -
rb_profile_framesis notably faster thanrb_make_backtraceas it returns pointers and does not perform allocations (which rb_make_backtrace does). - We must set a buffer size for
rb_profile_frameswhich we have picked here as 4096. This should cover the vast majority of calls. (i.e. the common stack depth in our large codebase is in the hundreds). There is a fallback torb_make_backtraceto support the very large stack depth cases.- Note: it would have been possible to use a smaller fixed size buffer to iterate over the entire stack with
rb_profile_framesand not need the fallback, however there is currently an open issue regarding the "start" parameter not working properly, which would be required for that.
- Note: it would have been possible to use a smaller fixed size buffer to iterate over the entire stack with
Co-authored by: @jhawthorn
Note
Currently, this approach will only work with ruby >= 3.0.
It appears that for ruby < 3.0, rb_profile_frames omits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict the rb_profile_frames approach to only be used for ruby >= 3.0.
Examples/validation
This file simulates a large callstack with deep recursion + method call:
# target.rb
def foo
"hello"
end
def recursive(n,stop)
foo
return if n >= stop
recursive(n + 1, stop)
end
recursive(0,1000) # line 13
Testing a step over the call on line 13 with this command:
time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
Baseline - no stepping:
$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.197s
user 0m0.151s
sys 0m0.052s
Stepping before this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m2.024s
user 0m1.987s
sys 0m0.040s
Stepping after this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.285s
user 0m0.218s
sys 0m0.069s
- So this gives a ~95% improvement in the performance of
nextfor the example case. - It also gives a ~80% performance improvement in the rare fallback case, where the stack depth exceeds 4096.
Expand to see the fallback results here
Extreme case: stack depth larger than BUFF_SIZE We can test this with:
# target.rb
def foo
"hello"
end
def recursive(n,stop)
foo
return if n >= stop
recursive(n + 1, stop)
end
recursive(0,4500) # line 13. Ensure stack depth > 4096
Baseline - no stepping:
$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.197s
user 0m0.187s
sys 0m0.013s
Stepping before this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m35.587s
user 0m35.536s
sys 0m0.045s
Stepping after this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m7.375s
user 0m7.221s
sys 0m0.158s
^ It gives an ~80% performance improvement for this extreme example, where we hit the rb_make_backtrace fallback.
Basically I agree with the idea but at least on ruby 2.7 and before it doesn't work because it skips C method frames. I need to check more it is same or not on CRuby's code.
NOTE: next command can be more lightweight.
- Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.
- Using internal hack, we can get real stack depth.
- Setting more clever breakpoints we can eliminate per-line traces for
nextcommand. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).
@ko1 Yeah, currently this will only work for ruby 3.0 and up
Note
Currently, this approach will only work with ruby >= 3.0.
It appears that for ruby < 3.0,rb_profile_framesomits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict therb_profile_framesapproach to only be used for ruby >= 3.0.
Is there any modification we could do for ruby <= 2.7? I don't think there would be a way to use rb_profile_frames to get the stack depth for ruby <= 2.7 but I could be wrong.
Alternatively, perhaps we could restrict the logic so that the rb_profile_frames approach is only used for ruby >= 3.0 environments. At least then we could begin getting a performance improvement for those versions.
- Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.
That sounds good. Maybe that would be a more long term improvement and this optimization could be seen as a short term way to improve the performance.
- Using internal hack, we can get real stack depth.
Is that something we could do now? Would that be a better way to optimize instead of using rb_profile_frames?
- Setting more clever breakpoints we can eliminate per-line traces for
nextcommand. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).
Interesting, I am not sure if it would be needed, see the performance improvement in https://github.com/ruby/debug/pull/743#issuecomment-1250131651. It seems to work well. It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.
It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.
Correct.
@ko1 What do you think of this: https://github.com/ruby/debug/pull/746/commits/533fef0c35e80075b0da70a5bb46b20f7f06bbc3?diff=split&w ?
If we wanted to use the approach in this PR to make performance fast for ruby >= 3.0 we could do something like that.
That will allow use of rb_profile_frames in the ruby >=3.0 and use existing frame_depth implementation in other versions.
Github actions shows errors on Ruby 3.0? Could you rebase it and fix it?
On Ruby 3.2, I'll prepare the new debug feature to get depth directly.
Yeah, I can work on a fix for the failing versions 👍
Alright @ko1, I've fixed this in https://github.com/ruby/debug/pull/746/commits/55a0115c610f10039d36aa91420e68a02948f2e8 and checks are all passing 👍
- can you fix the conflict?
- can you squash commits if the separation of the commits is not important? (or I can merge with squash)
- can you fix the conflict?
Done.
- can you squash commits if the separation of the commits is not important? (or I can merge with squash)
@ko1 you can go ahead and merge with squash 👍