debug Faster next / finish by using `rb_profile

Description

By using rb_profile_frames instead of rb_make_backtrace in DEBUGGER__.frame_depth, we can improve the performance of frame_depth substantially for common cases. This leads to much faster next and finish performance.
rb_profile_frames is notably faster than rb_make_backtrace as it returns pointers and does not perform allocations (which rb_make_backtrace does).
We must set a buffer size for rb_profile_frames which we have picked here as 4096. This should cover the vast majority of calls. (i.e. the common stack depth in our large codebase is in the hundreds). There is a fallback to rb_make_backtrace to support the very large stack depth cases.
- Note: it would have been possible to use a smaller fixed size buffer to iterate over the entire stack with rb_profile_frames and not need the fallback, however there is currently an open issue regarding the "start" parameter not working properly, which would be required for that.

Co-authored by: @jhawthorn

Note Currently, this approach will only work with ruby >= 3.0. It appears that for ruby < 3.0, rb_profile_frames omits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict the rb_profile_frames approach to only be used for ruby >= 3.0.

Examples/validation

This file simulates a large callstack with deep recursion + method call:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Testing a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Baseline - no stepping:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.197s
user    0m0.151s
sys     0m0.052s

Stepping before this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.024s
user    0m1.987s
sys     0m0.040s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s

So this gives a ~95% improvement in the performance of next for the example case.
It also gives a ~80% performance improvement in the rare fallback case, where the stack depth exceeds 4096.

Expand to see the fallback results here

Extreme case: stack depth larger than BUFF_SIZE We can test this with:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,4500) # line 13. Ensure stack depth > 4096

Baseline - no stepping:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.197s
user    0m0.187s
sys     0m0.013s

Stepping before this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m35.587s
user    0m35.536s
sys     0m0.045s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m7.375s
user    0m7.221s
sys     0m0.158s

^ It gives an ~80% performance improvement for this extreme example, where we hit the rb_make_backtrace fallback.

Sep 01 '22 20:09 WillHalto

Basically I agree with the idea but at least on ruby 2.7 and before it doesn't work because it skips C method frames. I need to check more it is same or not on CRuby's code.

NOTE: next command can be more lightweight.

Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.
Using internal hack, we can get real stack depth.
Setting more clever breakpoints we can eliminate per-line traces for next command. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).

Sep 16 '22 09:09 ko1

@ko1 Yeah, currently this will only work for ruby 3.0 and up

Note
Currently, this approach will only work with ruby >= 3.0.
It appears that for ruby < 3.0, rb_profile_frames omits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict the rb_profile_frames approach to only be used for ruby >= 3.0.

Is there any modification we could do for ruby <= 2.7? I don't think there would be a way to use rb_profile_frames to get the stack depth for ruby <= 2.7 but I could be wrong.

Alternatively, perhaps we could restrict the logic so that the rb_profile_frames approach is only used for ruby >= 3.0 environments. At least then we could begin getting a performance improvement for those versions.

Sep 17 '22 21:09 WillHalto

Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.

That sounds good. Maybe that would be a more long term improvement and this optimization could be seen as a short term way to improve the performance.

Using internal hack, we can get real stack depth.

Is that something we could do now? Would that be a better way to optimize instead of using rb_profile_frames?

Setting more clever breakpoints we can eliminate per-line traces for next command. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).

Interesting, I am not sure if it would be needed, see the performance improvement in https://github.com/ruby/debug/pull/743#issuecomment-1250131651. It seems to work well. It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.

Sep 17 '22 21:09 WillHalto

It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.

Correct.

Oct 04 '22 07:10 ko1

@ko1 What do you think of this: https://github.com/ruby/debug/pull/746/commits/533fef0c35e80075b0da70a5bb46b20f7f06bbc3?diff=split&w ?

If we wanted to use the approach in this PR to make performance fast for ruby >= 3.0 we could do something like that.

That will allow use of rb_profile_frames in the ruby >=3.0 and use existing frame_depth implementation in other versions.

Oct 11 '22 22:10 WillHalto

Github actions shows errors on Ruby 3.0? Could you rebase it and fix it?

On Ruby 3.2, I'll prepare the new debug feature to get depth directly.

Oct 28 '22 10:10 ko1

Yeah, I can work on a fix for the failing versions 👍

Oct 28 '22 14:10 WillHalto

Alright @ko1, I've fixed this in https://github.com/ruby/debug/pull/746/commits/55a0115c610f10039d36aa91420e68a02948f2e8 and checks are all passing 👍

Oct 29 '22 17:10 WillHalto

can you fix the conflict?
can you squash commits if the separation of the commits is not important? (or I can merge with squash)

Nov 02 '22 17:11 ko1

can you fix the conflict?

Done.

can you squash commits if the separation of the commits is not important? (or I can merge with squash)

@ko1 you can go ahead and merge with squash 👍

Nov 02 '22 18:11 WillHalto

Faster next / finish by using `rb_profile_frames`

Description

Examples/validation