cloud-profiler-nodejs icon indicating copy to clipboard operation
cloud-profiler-nodejs copied to clipboard

This lib seems to block event loop

Open chrisdostert opened this issue 5 years ago • 10 comments

Environment details

  • OS: alpine linux
  • Node.js version: 14
  • @google-cloud/profiler version: 4.0.2

Steps to reproduce

  1. start an app with blocked-at and @google-cloud profiler wired up
  2. watch logs and you'll see frequent blocking operations logged between 2-8 seconds example:
Blocked for 7586.2085ms, operation started here: [
  '    at Profiler.collectProfile (/src/backend/node_modules/@google-cloud/profiler/src/profiler.ts:393:25)',
  '    at Profiler.runLoop (/src/backend/node_modules/@google-cloud/profiler/src/profiler.ts:379:36)',
  '    at listOnTimeout (internal/timers.js:551:17)',
  '    at processTimers (internal/timers.js:494:7)'
]

chrisdostert avatar Aug 05 '20 22:08 chrisdostert

I'm not able to reproduce multi-second blocks of the event loop.

I was able to sometime get long blocks (up to ~750ms) at this same location; but those did not occur consistently -- ~20ms of blocking was more common.

Here's an example of what was logged; note that the blocking was always logged after "Successfully created profile WALL.":

@google-cloud/profiler Successfully created profile WALL.
Blocked for 742.9036171875ms, operation started here: [
  '    at Profiler.collectProfile (/node_modules/@google-cloud/profiler/build/src/profiler.js:298:31)',
  '    at Profiler.runLoop (/node_modules/@google-cloud/profiler/build/src/profiler.js:285:40)',
  '    at listOnTimeout (internal/timers.js:549:17)',
c  '    at processTimers (internal/timers.js:492:7)'
]

Even though the line numbers in the stack trace suggest that the long delay may occur at the line which sends the CreateProfile request, the fact that I only see this when a wall profile is being collected suggests it may be something about the collection of wall profiles.

@chrisdostert -- If you would be able to enable debug logging (documented here; if modifying the source code isn't feasible, setting the environment variable GCLOUD_PROFILER_LOGLEVEL will also work) and check when within the logs the messages about long blocks are printed, that would be helpful. I'm hoping that I'm reproducing the issue you're observing, but it would be nice to confirm.

nolanmar511 avatar Aug 10 '20 16:08 nolanmar511

I added additional logging to local versions of cloud-profiler-nodejs and pprof-nodejs.

It looks like the blocking happens just after the call into native code to begin profile collection (here in pprof-nodejs).

I definitely see longer blocks on alpine-linux than I do when running in an non-alpine linux docker image (30-80ms, ve ~300ms).

My next step is to make a smaller reproducer which depends only on pprof-nodejs (since that seems to be where the issue is).

nolanmar511 avatar Aug 11 '20 18:08 nolanmar511

Yesterday, I added logging throughout pprof-nodejs, and was able to confirm that sometimes the method used to start the V8 CPU profiler takes a long time to return:

05:22:38.187643595: Calling cpuProfiler->StartProfiling
05:22:38.411110419: Finished calling cpuProfiler->StartProfiling
05:22:38.412645170: Finished calling startProfiling
05:22:38.414679435: Thu Aug 13 2020 05:22:38 GMT+0000 (Coordinated Universal Time): Blocked for 228.049232421875ms, operation started here: [
05:22:38.416546091: '    at Profiler.collectProfile (/cloud-profiler-nodejs/build/src/profiler.js:298:31)',
05:22:38.417701251: '    at Profiler.runLoop (/cloud-profiler-nodejs/build/src/profiler.js:285:40)',
05:22:38.419203220: '    at listOnTimeout (internal/timers.js:551:17)',
05:22:38.420860367: '    at processTimers (internal/timers.js:494:7)'
05:22:38.422373919: ] PROMISE undefined

The logging for "Calling cpuProfiler->StartProfiling" and "Finished calling cpuProfiler->StartProfiling" was added around the line on which the V8 CPU profiler was started.

I'm still working on creating a smaller reproducer (something which depends only on pprof-nodejs).

nolanmar511 avatar Aug 13 '20 15:08 nolanmar511

I'm actually less worried on a reproducer; I think I should look into seeing if I can make all calls into C++ asynchronous in pprof-nodejs.

Then, issues like this won't block the event loop.

nolanmar511 avatar Aug 13 '20 16:08 nolanmar511

Making the call to start the V8 CPU profiler asynchronous isn't option.

This NAN example has the following comment:

  // Executed inside the worker-thread.
  // It is not safe to access V8, or V8 data structures
  // here, so everything we need for input and output
  // should go on `this`.
  void Execute () {
...

To make starting the CPU profiler asynchronous (following traditional patterns), we would need to start the V8 profiler within the "Execute" method. So, we can't use standard NAN helpers to call cpuProfiler->StartProfiling asynchronously.

It still might be possible to do this... Maybe we can spawn a C++ thread which calls StartProfiling and wait for that thread to finish within Execute.

Before I continue looking in to making out call to the V8 profiler asynchronous, though, I'm going to look a bit into the V8 CPU profiler.

nolanmar511 avatar Aug 13 '20 21:08 nolanmar511

@psmarshall -- Could it be expected that CpuProfiler::StartProfiling sometimes takes on the order of 100ms-1s?

If it's within the range of expected behaviors, then I really want to make our call to start the CPU profiler asynchronous. But if it's not expected, then I might be better off looking at optimizing CpuProfiler::StartProfiling.

nolanmar511 avatar Aug 13 '20 21:08 nolanmar511

I think CpuProfiler::StartProfiling does some JIT code logging / dumps so that may need to be optimized. I think we touched on this in b/112464936#comment12 before. It would be good to profile with Linux perf or a similar profiler the StartProfiling function using node built with debug information (but optimized binary of course).

aalexand avatar Aug 13 '20 22:08 aalexand

Yes it's expected that CpuProfiler::StartProfiling takes a few hundred ms, we've seen that before for big sites e.g. facebook.com. I am pretty sure the slow part is where we iterate the heap and log some metadata about each function in ExistingCodeLogger::LogCompiledFunctions.

I don't remember ever trying to optimize the startup so there could be wins to find there.

We do all this logging on startup on the main thread because we need to access the v8 heap and copy all of the metadata we use for symbolizing functions during profiling to an off-heap data structure, because we access it from a background thread during profiling to keep the overhead of profiling itself low. Given this work has to happen on the main thread I don't think making it async is an option.

psmarshall avatar Aug 20 '20 09:08 psmarshall

Here is a repro just in V8 tests BTW: https://chromium-review.googlesource.com/c/v8/v8/+/2366695

Profiler startup takes ~4.8 seconds with 500k functions on the heap

psmarshall avatar Aug 20 '20 11:08 psmarshall

https://github.com/nodejs/diagnostics/issues/444 is related to this I think.

aalexand avatar Nov 19 '20 17:11 aalexand