node icon indicating copy to clipboard operation
node copied to clipboard

Crash during ThreadSafeFunction finalization

Open stefandtu opened this issue 2 years ago • 4 comments

Version

v.18.15.0

Platform

all (Win, MacOs)

Subsystem

node api

What steps will reproduce the bug?

electron_test_tsfn_node_integrated_mode.zip electrone_25_crash_tsfn

Close app window or press "Release" button

How often does it reproduce? Is there a required condition?

electron_node.log

[pid 628060, tid: 707296 ] --------------FreeEnvironment start : 000008820120C000 uv_loop: 00007FF740548E28 [pid 628060, tid: 707296 ] --------------FreeEnvironment set_stopping(true): 000008820120C000 [pid 628060, tid: 707296 ] --------------FreeEnvironment before RunCleanup : 000008820120C000 // Here, the released function is being finalized from a foreign context [pid 628060, tid: 707296 ] ThreadSafeFunction Finalize() f_name: test_tsfn Environment* :0000088200A9C000 Isolate: 0000088200834000 this=0000088200A7D400 [pid 628060, tid: 707296 ] env (param): 0000088200A9C000 env isolate ( env->isolate() ) : 0000088200834000 isolate to env ( Environment::GetCurrent(env->isolate()) ) : 000008820120C000 !!! Critical error Environment::GetCurrent(env->isolate()) not eq to env 0000088200A9C000 != 000008820120C000 .....

What is the expected behavior? Why is that the expected behavior?

No response

What do you see instead?

Absence of process abort

Additional information

My quick bug fix: /main/src/node_api.cc

void Finalize() {

v8::HandleScope scope(env->isolate); 
// <==  fix
v8::Context::Scope context_scope(env->context());    
// => fix
if (finalize_cb) {
  CallbackScope cb_scope(this);
  env->CallFinalizer<false>(finalize_cb, finalize_data, context);
}
EmptyQueueAndDelete();

}

stefandtu avatar Aug 14 '23 15:08 stefandtu

Hello. It looks very much like this error is a result of Electron using multiple Envs in a single thread.

https://github.com/electron/electron/issues/36858 https://github.com/electron/electron/pull/38754

stefandtu avatar Aug 22 '23 22:08 stefandtu

@stefandtu hi, do you think this issue is resolved or is there anything we can do on Node.js side?

legendecas avatar Sep 22 '23 15:09 legendecas

Hi! I'm unsure if the problem has been resolved. At the moment, I've applied a workaround on the node.js side, given my limited control over the other code (electron and js). I'm not sure whether node.js should allow the creation of multiple environments within a single thread, or if this is an issue with the electron approach.

I'm trying to switch the context to the correct one (via the call to create_valid_context_scope_if_required) when I notice the current context breaking and executing tasks from the queue that belong to other contexts. node_api.zip

stefandt avatar Sep 29 '23 13:09 stefandt

I'm not sure whether node.js should allow the creation of multiple environments within a single thread, or if this is an issue with the electron approach.

There's a legitimate use case for this, and architecturally it's not much different from worker threads. Electron needs this because it allows them to take advantage of Chromium's practice of reusing processes instead of having to patch it out and suffering performance penalties.

The problem, though, is that (in my case) I'm seeing an issue in Electron with use of ThreadSafeFunction that I'm not seeing in my test scripts that use worker threads. I'm trying to make a native module that can run in an Electron renderer process — meaning it must be context-aware. The first time I reload a browser window, it works fine; the second time it hangs while waiting for handles to close:

* thread #1, name = 'CrRendererMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x000000010dceb2cc Electron Framework`uv__async_io [inlined] uv__queue_insert_tail(h=0x0000000116750d60, q=0x00000001048a4298) at queue.h:80:16 [opt]
    frame #1: 0x000000010dceb2c8 Electron Framework`uv__async_io(loop=0x0000000116750b08, w=<unavailable>, events=<unavailable>) at async.c:165:5 [opt]
    frame #2: 0x000000010dcfbe4c Electron Framework`uv__io_poll(loop=0x0000000116750b08, timeout=0) at kqueue.c:0 [opt]
    frame #3: 0x000000010dceb7c0 Electron Framework`uv_run(loop=0x0000000116750b08, mode=UV_RUN_ONCE) at core.c:447:5 [opt]
    frame #4: 0x00000001149b4d80 Electron Framework`node::Environment::CleanupHandles(this=0x0000013800c93800) at env.cc:1115:5 [opt]

Not being a systems programmer by trade, all I can ascertain is that my use of Napi::ThreadSafeFunction is the culprit. If I do all the same work without using a ThreadSafeFunction, I can reload my window as often as I want. I've tried just about every version of cleanup I can think of or research, but none seem to improve the situation.

Anyway, just chiming in to say that there's certainly something that needs to be fixed (though I'm not sure whose responsibility it would be), and it seems like the ZIP attached to this issue is just as valid as whatever reduced test case I could write.

savetheclocktower avatar Oct 16 '24 21:10 savetheclocktower