node icon indicating copy to clipboard operation
node copied to clipboard

Crash at `uv_write2` with `Assertion failed` on Darwin

Open Kamillaova opened this issue 7 months ago • 4 comments

Version

v24.1.0

Platform

Darwin host.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 Darwin

Subsystem

net

What steps will reproduce the bug?

I don't know

How often does it reproduce? Is there a required condition?

Every 1-5 days and I don't know after what it crashes

What is the expected behavior? Why is that the expected behavior?

It shouldn't crash.

What do you see instead?

https://gist.github.com/Kamillaova/6814bcefb3b3ebdb3da3836614b8f5f6

Assertion failed: (n <= stream->write_queue_size), function uv__write_req_update, file src/unix/stream.c, line 694.
Process 81494 stopped
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x00000001044935dc libuv.1.dylib`uv__write + 780
libuv.1.dylib`uv__write:
->  0x1044935dc <+780>: adr    x0, #0xb61d               ; "uv__write"
    0x1044935e0 <+784>: nop
    0x1044935e4 <+788>: adr    x1, #0xb007               ; "src/unix/stream.c"
    0x1044935e8 <+792>: nop
Target 0: (node) stopped.
(lldb) bt
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #0: 0x0000000189c76a60 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000189caec20 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000189bbba30 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000189bbad20 libsystem_c.dylib`__assert_rtn + 284
  * frame #4: 0x00000001044935dc libuv.1.dylib`uv__write + 780
    frame #5: 0x000000010449319c libuv.1.dylib`uv_write2 + 428
    frame #6: 0x0000000100267e80 node`non-virtual thunk to node::LibuvStreamWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*) + 68
    frame #7: 0x000000010025f628 node`node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool) + 444
    frame #8: 0x00000001002602b0 node`node::StreamBase::WriteBuffer(v8::FunctionCallbackInfo<v8::Value> const&) + 604
    frame #9: 0x0000000100261580 node`void node::StreamBase::JSMethod<&node::StreamBase::WriteBuffer(v8::FunctionCallbackInfo<v8::Value> const&)>(v8::FunctionCallbackInfo<v8::Value> const&) + 188
    frame #10: 0x00000001010a2818 node`Builtins_CallApiCallbackGeneric + 152
    frame #11: 0x000000014813c0cc
    frame #12: 0x0000000148039d38
    frame #13: 0x000000014803a348
    frame #14: 0x00000001485d46d4
    frame #15: 0x000000014850d29c
    frame #16: 0x00000001010df648 node`Builtins_AsyncFunctionAwaitResolveClosure + 72
    frame #17: 0x00000001011b45f8 node`Builtins_PromiseFulfillReactionJob + 56
    frame #18: 0x00000001010cea50 node`Builtins_RunMicrotasks + 560
    frame #19: 0x000000010109e7b0 node`Builtins_JSRunMicrotasksEntry + 176
    frame #20: 0x00000001004e13bc node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 1508
    frame #21: 0x00000001004e1dec node`v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 104
    frame #22: 0x00000001004e1fb0 node`v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) + 72
    frame #23: 0x00000001005170c0 node`v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) + 356
    frame #24: 0x0000000100517b04 node`v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) + 128
    frame #25: 0x000000010002923c node`node::InternalCallbackScope::Close() + 332
    frame #26: 0x0000000100028bec node`node::InternalCallbackScope::~InternalCallbackScope() + 20
    frame #27: 0x00000001000ac9b0 node`node::Environment::RunTimers(uv_timer_s*) + 792
    frame #28: 0x00000001044823f0 libuv.1.dylib`uv__run_timers + 132
    frame #29: 0x0000000104486114 libuv.1.dylib`uv_run + 612
    frame #30: 0x0000000100029fcc node`node::SpinEventLoopInternal(node::Environment*) + 272
    frame #31: 0x0000000100178b24 node`node::NodeMainInstance::Run() + 268
    frame #32: 0x00000001000dc960 node`node::Start(int, char**) + 792
    frame #33: 0x00000001899260e0 dyld`start + 2360
(lldb)

Additional information

https://github.com/mtcute/mtcute/blob/73e32b831a678d84640c8b9f9a5fe85f7fd8b30e/packages/core/src/network/session-connection.ts#L1645

Kamillaova avatar Jun 15 '25 22:06 Kamillaova

Could you please provide a minimal repro case?

juanarbol avatar Jun 15 '25 22:06 juanarbol

I don't think so, at least for now, since the problem is very flaky and occurs deep in complex code.

Is it possible to get js stacktrace when a crash occurs in native code?

Kamillaova avatar Jun 15 '25 22:06 Kamillaova

Also, the problem probably introduced after upgrading from 23 to 24 node.

Kamillaova avatar Jun 15 '25 22:06 Kamillaova

Is it possible to get js stacktrace when a crash occurs in native code?

If JS-land throws, yes.

Do you have more certainties? It's quite difficult to debug and fix your problem with that information.

juanarbol avatar Jun 15 '25 23:06 juanarbol

Ping to @nodejs/libuv

~The only assert in unix/stream.c is this one https://github.com/libuv/libuv/blob/0d4f54f0f6a803d7ec06c647d4899e94cd8607c4/src/unix/stream.c#L846~

This seems to be a mismatch in the write queue and its size, but have no idea why. I think this is a very nasty race condition

juanarbol avatar Jun 16 '25 01:06 juanarbol

Sadly I can't investigate that much with your provided information.

By looking at the changelog of 1.50.0 to v1.51.0 I don't see anything problematic, I don't this is a bug in 1.51.0.

I don't think this diff is a proper fix, but this is gonna avoid the crash, although the crash is telling us that something really bad is happening. No bueno

diff --git a/src/unix/stream.c b/src/unix/stream.c
index 204f5a25..159ca4d0 100644
--- a/src/unix/stream.c
+++ b/src/unix/stream.c
@@ -855,6 +855,10 @@ static void uv__write(uv_stream_t* stream) {
     if (uv__queue_empty(&stream->write_queue))
       return;
 
+    /* Refs: https://github.com/nodejs/node/issues/58720 */
+    if (stream->write_queue_size == 0)
+      return;
+
     q = uv__queue_head(&stream->write_queue);
     req = uv__queue_data(q, uv_write_t, queue);
     assert(req->handle == stream);

juanarbol avatar Jun 16 '25 02:06 juanarbol

If JS-land throws, yes.

And if an assertion (and SIGABRT) occurs in native code, then I won't be able to?

Kamillaova avatar Jun 17 '25 16:06 Kamillaova

I think I can add more detailed logging to _doFlush (and deeper), will probably try to do that by the end of this week/beginning of next

I'll also try to update to 24.2.0

Kamillaova avatar Jun 17 '25 16:06 Kamillaova

I won't investigate further w/out a Minimal reproducible example .

juanarbol avatar Jun 19 '25 03:06 juanarbol

As there's not answer from OP, I'm closing this one.

juanarbol avatar Jul 06 '25 04:07 juanarbol