Crash at `uv_write2` with `Assertion failed` on Darwin
Version
v24.1.0
Platform
Darwin host.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 Darwin
Subsystem
net
What steps will reproduce the bug?
I don't know
How often does it reproduce? Is there a required condition?
Every 1-5 days and I don't know after what it crashes
What is the expected behavior? Why is that the expected behavior?
It shouldn't crash.
What do you see instead?
https://gist.github.com/Kamillaova/6814bcefb3b3ebdb3da3836614b8f5f6
Assertion failed: (n <= stream->write_queue_size), function uv__write_req_update, file src/unix/stream.c, line 694.
Process 81494 stopped
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = hit program assert
frame #4: 0x00000001044935dc libuv.1.dylib`uv__write + 780
libuv.1.dylib`uv__write:
-> 0x1044935dc <+780>: adr x0, #0xb61d ; "uv__write"
0x1044935e0 <+784>: nop
0x1044935e4 <+788>: adr x1, #0xb007 ; "src/unix/stream.c"
0x1044935e8 <+792>: nop
Target 0: (node) stopped.
(lldb) bt
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = hit program assert
frame #0: 0x0000000189c76a60 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000189caec20 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x0000000189bbba30 libsystem_c.dylib`abort + 180
frame #3: 0x0000000189bbad20 libsystem_c.dylib`__assert_rtn + 284
* frame #4: 0x00000001044935dc libuv.1.dylib`uv__write + 780
frame #5: 0x000000010449319c libuv.1.dylib`uv_write2 + 428
frame #6: 0x0000000100267e80 node`non-virtual thunk to node::LibuvStreamWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*) + 68
frame #7: 0x000000010025f628 node`node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool) + 444
frame #8: 0x00000001002602b0 node`node::StreamBase::WriteBuffer(v8::FunctionCallbackInfo<v8::Value> const&) + 604
frame #9: 0x0000000100261580 node`void node::StreamBase::JSMethod<&node::StreamBase::WriteBuffer(v8::FunctionCallbackInfo<v8::Value> const&)>(v8::FunctionCallbackInfo<v8::Value> const&) + 188
frame #10: 0x00000001010a2818 node`Builtins_CallApiCallbackGeneric + 152
frame #11: 0x000000014813c0cc
frame #12: 0x0000000148039d38
frame #13: 0x000000014803a348
frame #14: 0x00000001485d46d4
frame #15: 0x000000014850d29c
frame #16: 0x00000001010df648 node`Builtins_AsyncFunctionAwaitResolveClosure + 72
frame #17: 0x00000001011b45f8 node`Builtins_PromiseFulfillReactionJob + 56
frame #18: 0x00000001010cea50 node`Builtins_RunMicrotasks + 560
frame #19: 0x000000010109e7b0 node`Builtins_JSRunMicrotasksEntry + 176
frame #20: 0x00000001004e13bc node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 1508
frame #21: 0x00000001004e1dec node`v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 104
frame #22: 0x00000001004e1fb0 node`v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) + 72
frame #23: 0x00000001005170c0 node`v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) + 356
frame #24: 0x0000000100517b04 node`v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) + 128
frame #25: 0x000000010002923c node`node::InternalCallbackScope::Close() + 332
frame #26: 0x0000000100028bec node`node::InternalCallbackScope::~InternalCallbackScope() + 20
frame #27: 0x00000001000ac9b0 node`node::Environment::RunTimers(uv_timer_s*) + 792
frame #28: 0x00000001044823f0 libuv.1.dylib`uv__run_timers + 132
frame #29: 0x0000000104486114 libuv.1.dylib`uv_run + 612
frame #30: 0x0000000100029fcc node`node::SpinEventLoopInternal(node::Environment*) + 272
frame #31: 0x0000000100178b24 node`node::NodeMainInstance::Run() + 268
frame #32: 0x00000001000dc960 node`node::Start(int, char**) + 792
frame #33: 0x00000001899260e0 dyld`start + 2360
(lldb)
Additional information
https://github.com/mtcute/mtcute/blob/73e32b831a678d84640c8b9f9a5fe85f7fd8b30e/packages/core/src/network/session-connection.ts#L1645
Could you please provide a minimal repro case?
I don't think so, at least for now, since the problem is very flaky and occurs deep in complex code.
Is it possible to get js stacktrace when a crash occurs in native code?
Also, the problem probably introduced after upgrading from 23 to 24 node.
Is it possible to get js stacktrace when a crash occurs in native code?
If JS-land throws, yes.
Do you have more certainties? It's quite difficult to debug and fix your problem with that information.
Ping to @nodejs/libuv
~The only assert in unix/stream.c is this one https://github.com/libuv/libuv/blob/0d4f54f0f6a803d7ec06c647d4899e94cd8607c4/src/unix/stream.c#L846~
This seems to be a mismatch in the write queue and its size, but have no idea why. I think this is a very nasty race condition
Sadly I can't investigate that much with your provided information.
By looking at the changelog of 1.50.0 to v1.51.0 I don't see anything problematic, I don't this is a bug in 1.51.0.
I don't think this diff is a proper fix, but this is gonna avoid the crash, although the crash is telling us that something really bad is happening. No bueno
diff --git a/src/unix/stream.c b/src/unix/stream.c
index 204f5a25..159ca4d0 100644
--- a/src/unix/stream.c
+++ b/src/unix/stream.c
@@ -855,6 +855,10 @@ static void uv__write(uv_stream_t* stream) {
if (uv__queue_empty(&stream->write_queue))
return;
+ /* Refs: https://github.com/nodejs/node/issues/58720 */
+ if (stream->write_queue_size == 0)
+ return;
+
q = uv__queue_head(&stream->write_queue);
req = uv__queue_data(q, uv_write_t, queue);
assert(req->handle == stream);
If JS-land throws, yes.
And if an assertion (and SIGABRT) occurs in native code, then I won't be able to?
I think I can add more detailed logging to _doFlush (and deeper), will probably try to do that by the end of this week/beginning of next
I'll also try to update to 24.2.0
I won't investigate further w/out a Minimal reproducible example .
As there's not answer from OP, I'm closing this one.