node icon indicating copy to clipboard operation
node copied to clipboard

`Segmentation fault` when executing worker with `eval: true` containing dynamic import after `await`

Open sapphi-red opened this issue 3 years ago • 7 comments

Version

v16.15.0

Platform

Microsoft Windows NT 10.0.19044.0 x64 / WSL: Linux PC 4.4.0-19041-Microsoft #1237-Microsoft Sat Sep 11 14:32:00 PST 2021 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

  1. Download https://gist.github.com/sapphi-red/a0179b52ebd3b6a19f51743594810ecf
  2. Run node eval_cjs.mjs. It exits with non-0 exit code. a. On windows, it does not show any error messages. The output of echo $LASTEXITCODE is -1073741819. b. On WSL (Ubuntu), it shows Segmentation fault (core dumped). The output of echo $? is 139.
  3. Run node eval_cjs_cjs.cjs. It exits with non-0 exit code same with eval_cjs.mjs.
  4. Run noneval_cjs.mjs. It finishes without any errors.
  5. Run noneval_mjs.mjs. It finishes without any errors.

The difference between the files are:

filename parent script worker script is eval result
eval_cjs.mjs ESM CJS O fail
eval_cjs_cjs.cjs CJS CJS O fail
noneval_cjs.mjs ESM CJS X success
noneval_mjs.mjs ESM ESM X success

How often does it reproduce? Is there a required condition?

It reproduces every time on my machine. const startAt = Date.now() + 10 * 1000 might differ between machines.

I suppose at least it needs to meet the following condition:

  • worker is created with eval: true
  • dynamic import is called after await

I am not sure why it requires 10 seconds delay.

What is the expected behavior?

Segmentation fault not happening.

What do you see instead?

Segmentation fault is happening.

Additional information

context: I was tring to fix https://github.com/vitejs/vite/pull/8049#issuecomment-1135919637. (FailureMessage one is not related to this.)

sapphi-red avatar May 25 '22 13:05 sapphi-red

Thanks for the report. I can also reproduce with v18.2.0. The stack trace looks like this:

(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100097270 node`node::loader::ImportModuleDynamically(v8::Local<v8::Context>, v8::Local<v8::Data>, v8::Local<v8::Value>, v8::Local<v8::String>, v8::Local<v8::FixedArray>) + 608
    frame #1: 0x00000001003afcb6 node`v8::internal::Isolate::RunHostImportModuleDynamicallyCallback(v8::internal::Handle<v8::internal::Script>, v8::internal::Handle<v8::internal::Object>, v8::internal::MaybeHandle<v8::internal::Object>) + 886
    frame #2: 0x0000000100821ff5 node`v8::internal::Runtime_DynamicImportCall(int, unsigned long*, v8::internal::Isolate*) + 293
    frame #3: 0x0000000100c11a34 node`Builtins_CEntry_Return1_DontSaveFPRegs_ArgvInRegister_NoBuiltinExit + 52
    frame #4: 0x0000000100cb0d0e node`Builtins_CallRuntimeHandler + 78
    frame #5: 0x0000000100b93c4f node`Builtins_InterpreterEntryTrampoline + 207
    frame #6: 0x0000000100bc8e3f node`Builtins_AsyncFunctionAwaitResolveClosure + 63
    frame #7: 0x0000000100c62d38 node`Builtins_PromiseFulfillReactionJob + 56
    frame #8: 0x0000000100bba4c4 node`Builtins_RunMicrotasks + 644
    frame #9: 0x0000000100b91c83 node`Builtins_JSRunMicrotasksEntry + 131
    frame #10: 0x0000000100391f70 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 2944
    frame #11: 0x0000000100392533 node`v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 83
    frame #12: 0x0000000100392721 node`v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*, v8::internal::MaybeHandle<v8::internal::Object>*) + 81
    frame #13: 0x00000001003bda7b node`v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) + 315
    frame #14: 0x00000001003be2b1 node`v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) + 97
    frame #15: 0x0000000100001dfe node`node::InternalCallbackScope::Close() + 462
    frame #16: 0x00000001000017de node`node::InternalCallbackScope::~InternalCallbackScope() + 14
    frame #17: 0x000000010006c232 node`node::Environment::RunTimers(uv_timer_s*) + 578
    frame #18: 0x0000000100b6f186 node`uv__run_timers + 54
    frame #19: 0x0000000100b73b07 node`uv_run + 247
    frame #20: 0x0000000100002f9f node`node::SpinEventLoop(node::Environment*) + 287
    frame #21: 0x000000010017da98 node`node::worker::Worker::Run() + 2056
    frame #22: 0x0000000100181472 node`node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) + 50
    frame #23: 0x00007fff7e2722eb libsystem_pthread.dylib`_pthread_body + 126
    frame #24: 0x00007fff7e275249 libsystem_pthread.dylib`_pthread_start + 66
    frame #25: 0x00007fff7e27140d libsystem_pthread.dylib`thread_start + 13

From looking at the disassembly I expect the nullptr dereference is around here: https://github.com/nodejs/node/blob/1531ef1c0c30b35b764981683383187261c97779/src/module_wrap.cc#L585-L605

bnoordhuis avatar May 26 '22 08:05 bnoordhuis

I have located the code location that triggers the null pointer exception: https://github.com/nodejs/node/compare/master...leizongmin:fix/issue-43205

                     ->Uint32Value(context)
                     .ToChecked();
   if (type == ScriptType::kScript) {
-    contextify::ContextifyScript* wrap = env->id_to_script_map.find(id)->second;
-    object = wrap->object();
+    auto it = env->id_to_script_map.find(id);
+    CHECK_NE(it, env->id_to_script_map.end());
+    object = it->second->object();
   } else if (type == ScriptType::kModule) {
     ModuleWrap* wrap = ModuleWrap::GetFromID(env, id);
     object = wrap->object();

Below is the simplified code that reproduces the problem in versions v16.14.2 and v19.0.0-pre:

import { Worker } from "worker_threads";

const code = `
(async function () {
  await new Promise((resolve) => setTimeout(resolve, 10 * 1000));
  import('worker_threads');
})();
`;

const worker = new Worker(code, { eval: true });

After adding the null pointer check, the error message is like this:

node[5913]: ../../src/module_wrap.cc:595:v8::MaybeLocal<v8::Promise> node::loader::ImportModuleDynamically(v8::Local<v8::Context>, v8::Local<v8::Data>, v8::Local<v8::Value>, v8::Local<v8::String>, v8::Local<v8::FixedArray>): Assertion `(it) != (env->id_to_script_map.end())' failed.
 1: 0x55f5348bd284 node::Abort() [node]
 2: 0x55f5348bd318  [node]
 3: 0x55f53487286c  [node]
 4: 0x55f534c69a30 v8::internal::Isolate::RunHostImportModuleDynamicallyCallback(v8::internal::Handle<v8::internal::Script>, v8::internal::Handle<v8::internal::Object>, v8::internal::MaybeHandle<v8::internal::Object>) [node]
 5: 0x55f53510aa86 v8::internal::Runtime_DynamicImportCall(int, unsigned long*, v8::internal::Isolate*) [node]
 6: 0x55f535577eb4  [node]

I will continue to focus on this issue.

leizongmin avatar May 30 '22 16:05 leizongmin

I found this issue is related to https://github.com/nodejs/node/issues/38695 and https://github.com/nodejs/node/issues/25424

leizongmin avatar Jun 06 '22 17:06 leizongmin

Simpler reproducible code Check https://github.com/nodejs/node/issues/44211#issuecomment-1271137984 for the fix I came with :)

const vm = require('node:vm')

const code = `
new Promise(resolve => {
  setTimeout(() => {
    gc(); // gc the vm.Script instance, causing executing 'import' in microtask a segment fault
    resolve();
  }, 1);
}).then(() => import('http'));`

// under the hood, a vm.Script instance is created
vm.runInThisContext(code)

ywave620 avatar Oct 08 '22 03:10 ywave620

Related: https://github.com/nodejs/node/issues/47096

transitive-bullshit avatar Mar 15 '23 16:03 transitive-bullshit

Another minimized test case from a different perspective is this: node-segfault.js:

let Module = module.constructor;
Module.wrap = Module.wrap; // put the loader into patched mode
let { dynamicImport } = require('./dynamic-import');
gc();
dynamicImport();

dynamic-import.js:

module.exports.dynamicImport = () => { return import ('./node-segfault.js'); } // dynamically import anything

Setting the Module.wrap puts the loader into "patched" mode, where it uses the vm Script/context object. This creates a ContextifyScript, which records the script ids in the id_to_script_map map, but is also weakly referenced, so the garbage collection causes the id/script to be removed from id_to_script_map when GC'ed, and so the offending line from ImportModuleDynamically attempts to access a script by id that is no longer there.

To help with anyone searching for the segfault isses, modifying/patching Module.wrap is done by some libraries, like the rewire package. And if you came here looking for a terrible and horrible hack to work around this seg fault, you can do this to prevent the GC-induced segfault:

const { Script } = require('vm');
let pinnedScripts = [];
let originalRun = Script.prototype.runInThisContext;
Script.prototype.runInThisContext = function (options) {
	pinnedScripts.push(this);
	return originalRun.call(this, options);
};

Making ContextifyScript not weak also eliminates this issue, but I assume scripts are supposed to be collectable.

kriszyp avatar May 04 '23 12:05 kriszyp

I tested https://github.com/nodejs/node/pull/48510 on the repro in the OP, in https://github.com/nodejs/node/issues/43205#issuecomment-1272209939 and https://github.com/nodejs/node/issues/43205#issuecomment-1534663358 and they are all fixed locally. I'll write a test based on the repo in https://github.com/nodejs/node/issues/43205#issuecomment-1272209939 in the PR because it's simpler

joyeecheung avatar Jun 22 '23 11:06 joyeecheung

To help with anyone searching for the segfault isses, modifying/patching Module.wrap is done by some libraries, like the rewire package. And if you came here looking for a terrible and horrible hack to work around this seg fault, you can do this to prevent the GC-induced segfault:

const { Script } = require('vm');
let pinnedScripts = [];
let originalRun = Script.prototype.runInThisContext;
Script.prototype.runInThisContext = function (options) {
	pinnedScripts.push(this);
	return originalRun.call(this, options);
};

Making ContextifyScript not weak also eliminates this issue, but I assume scripts are supposed to be collectable.

I've been trying to use this workaround but I'm not getting it to stick. I have the exact same issue as here, but with typescript. Forgive me for being a total js/ts beginner, but can you give a wider example of where to put the workaround? Specifically, I have a jest test with a beforeAll. I tried it both in the top of the file as well as in the beforeAll and my jest test setup script (ts), but to no effect.

etnoy avatar Jul 27 '23 21:07 etnoy

@etnoy You might try verifying/debugging that the pinnedScripts are indeed being appended. And I believe you do need to ensure that this is executed before you load the module that will execute a dynamic import. Sorry, I don't have any suggestions besides that.

kriszyp avatar Jul 28 '23 01:07 kriszyp

@etnoy You might try verifying/debugging that the pinnedScripts are indeed being appended. And I believe you do need to ensure that this is executed before you load the module that will execute a dynamic import. Sorry, I don't have any suggestions besides that.

Thanks, will do!

etnoy avatar Jul 28 '23 10:07 etnoy

Closing as https://github.com/nodejs/node/pull/48510 has landed and should fix this. We can re-open if that turns out to be incorrect (hopefully not).

joyeecheung avatar Sep 14 '23 15:09 joyeecheung