profiler icon indicating copy to clipboard operation
profiler copied to clipboard

Repeated invocations of processThreadCPUDelta during symbolication take up more time than necessary

Open mstange opened this issue 4 years ago • 4 comments

Profile: https://share.firefox.dev/2X0vayw

Every time symbolication updates the thread, processThreadCPUDelta is called again for the modified profile. It seems like this is something that could be done only once. We could bake the guarantees that processThreadCPUDelta makes into the processed profile format. That means, instead of calling it from a selector, we would call it in these two places:

  • During profile processing, and
  • In an upgrader for the processed profile format.

It's not taking a ton of time, and symbolication is slow already, but it's one of many things that adds up to the slowness and it might be easy to fix.

mstange avatar Aug 05 '21 18:08 mstange

I remember discussing whether doing this in the processing step or in a selector when implementing it. And doing this in the selector seemed more beneficial at that time. I think there were a few reasons for this. But the biggest thing I remember was: In the first days of this feature, neither the front-end or nor the back-end was fully stable. So, it seemed better to do this in the selector to avoid any irreversible problems during the profile processing.

I think this isn't a huge problem for us anymore as the feature has become more mature since then. But I felt like it's always safer to do it in a selector instead of overwriting the thread values in the processing.

canova avatar Aug 09 '21 13:08 canova

This could stay as a selector but still be reworked a bit so that we can memoize the loop.

Simple solution: extract the inner loop into a memoized function, as it seems to only depend on threadCPUDeltaValue.

A bit less simple solution but maybe cleaner (but quite similar in the reasoning): break the existing selector into 2:

  1. one that would generate the new threadCPUDeltaValue, that is make a selector out of the inner loop only
  2. one that would take the result of that selector + getThread, and generate the new thread, that should be very cheap to run.

julienw avatar Aug 10 '21 12:08 julienw

The second solution may be simpler in the end, because we control better how many memoized values we'll keep.

julienw avatar Aug 10 '21 12:08 julienw

I like the idea to have a finer selector with memoization that's just based on the threadCPUDelta column.

mstange avatar Jun 08 '22 18:06 mstange