http2, tls: check content-length, fix RST and GOAWAY logic
Fixes #35209 Closes #35378
Time spent: 90h
[!CAUTION] These bug fixes are potentially
semver-major!
- Fixed
stream.close(NGHTTP2_CANCEL)closing with0akaNGHTTP2_NO_ERROR. -
serverStream.destroy()closed with0, nowNGHTTP2_INTERNAL_ERROR. -
clientStream.destroy()closed with0, nowNGHTTP2_CANCEL. - Mismatched
content-lengthnow throwsNGHTTP2_PROTOCOL_ERRORas according to the spec. - Fixed
GOAWAY(server -> client) being greeted withGOAWAY(client -> server) as it's against the spec. - Fixed streams being always closed with
NGHTTP2_CANCELon socket close, now respectsgoawayCodeanddestroyCode. The default isNGHTTP2_INTERNAL_ERRORfor premature TCP FIN and TCP RST. - Fixed
stream.rstCodebeing 0 while active - docs say it should be undefined. - Fixed preferring
sessionCodeoverrstCodewhen destroying a stream with an error. - Fixed streams being closed with
NO_ERRORif session receivedGOAWAYwithNO_ERRORand remote closed connection. - Fixed streams being closed with
NO_ERRORif session sentGOAWAYwithNO_ERRORand destroyed. - Fixed
GOAWAYpreventingRST_STREAMfrom being sent beforeGOAWAY. nghttp2 correctly assumes that it should preventRST_STREAMfrom being sent but incorretly applies it to all frames in a packet instead of frames defined afterGOAWAY. This is not the only thing that nghttp2 does wrong:- https://github.com/nghttp2/nghttp2/issues/1166
- https://github.com/nghttp2/nghttp2/issues/692
- Fixed
benchmark/http2/headers.jscallingclient.destroy()(resulting in dropped requests). Now callsclient.close()which closes gracefully. -
connectStreamSocket.destroy()now closes withNGHTTP2_CONNECT_ERROR. - Fixed double
GOAWAYonsession.close(). - Fixed queued RST_STREAM and GOAWAY being dropped if
Http2Session::Close()and previous writes weren't finished yet. - Fixed stream frame errors closing the entire session instead of just the stream.
- Fixed stream frame errors sending
RST_STREAMon idle streams (to reproduce 16. needs to be fixed). - Fixed https://github.com/nodejs/node/issues/49484
- Fixed server close not being emitted if TLS socket RSTs
- Fixed deadlock when sending GOAWAY before receiving server's GOAWAY. It won't gracefully close because the server may never ACK our GOAWAY. Now it sends TCP RST after terminating GOAWAY.
Sorry so many bugs are fixed in a single PR but it's impossible to fix one without fixing them all!
Best if https://github.com/nghttp2/nghttp2/pull/1508 got merged before this, but it has been open for years 😢
Hence the patch for nghttp2_session.c
/cc @jasnell
Review requested:
- [ ] @nodejs/http
- [ ] @nodejs/http2
- [ ] @nodejs/net
- [ ] @nodejs/security-wg
The linter and commit validator are failing, could you check?
An HTTP response is complete after the server sends -- or the client receives -- a frame with the END_STREAM flag set (including any CONTINUATION frames needed to complete a field block). A server can send a complete response prior to the client sending an entire request if the response does not depend on any portion of the request that has not been sent and received. When this is true, a server MAY request that the client abort transmission of a request without error by sending a RST_STREAM with an error code of NO_ERROR after sending a complete response (i.e., a frame with the END_STREAM flag set). Clients MUST NOT discard responses as a result of receiving such a RST_STREAM, though clients can always discard responses at their discretion for other reasons.
That's interesting. RST_STREAM with NO_ERROR means: don't send anything to me, but process what I sent you. Rapid reset? (good it's fixed in nghttp2)
@jasnell could you reach out to the nghttp2 maintainers and ask about the status of that PR? I would sincerely hope we wouldn't have to maintain a floating patch.
CI: https://ci.nodejs.org/job/node-test-pull-request/59515/
I got stuck. test-http2-multistream-destroy-on-read-tls.js doesn't receive last RST for /j. This happens only on TLS. I used the keylog TLS event to get the key and I used Wireshark to inspect the traffic, and I was amazed to see this:
Aside the missing RST, there's no GOAWAY frame from the client. Even if I restore node_http2.cc:794 to be
CHECK_EQ(nghttp2_session_terminate_session(session_.get(), code), 0);
the result is the same - no GOAWAY frame. I don't know how nghttp2 builds frames, nor how the streams work internally so I would greatly appreciate any help here!
~~After digging it looks like there's a bug in the queue mechanism in nghttp2?~~
Edit: For some reason the session addresses differ? Investigating! (I won't ping nghttp2 yet till I know it's definitely in nghttp2)
PUSH RST_STREAM TO QUEUE session=0x5a831ddf2128 ob_reg=0x5a831ddf2240 item=0x5a831dc43b88
pointer 0x5a831dc43b88
to head
node goaway, nghttp2 session=0x5a831ddf2128
pop session=0x5a831de25c08 ob_reg=0x5a831de25d20 top=(nil)
Have you tried pinging nghttp2?
Ok I know what's happening:
submit rst stream
adding rst stream
PUSH RST_STREAM TO QUEUE session=0x60cd23c980e8 ob_reg=0x60cd23c98200 item=0x60cd23979b88
pointer 0x60cd23979b88
to head
goaway
node goaway, nghttp2 session=0x60cd23c980e8
sending pending data session=0x60cd23c980e8
sending, returning
write finished
sending pending data session=0x60cd23c980e8
destroyed, returning
sending pending data session=0x60cd23c980e8
destroyed, returning
It's a bug in Node.js. It can't send the next RST_STREAM & GOAWAY, because it's sending the previous ones. Relevant code:
if (!socket_closed) {
Debug(this, "terminating session with code %d", code);
CHECK_EQ(nghttp2_session_terminate_session(session_.get(), code), 0);
SendPendingData(); // fails because it's already sending the previous frames
} else if (stream_ != nullptr) {
stream_->RemoveStreamListener(this);
}
set_destroyed(); // gets destroyed before finished writing
// queued writes aren't sent because it got destroyed above, so OnStreamAfterWrite -> MaybeScheduleWrite never runs because the stream is destroyed!
I think I know how to fix it.
CI: https://ci.nodejs.org/job/node-test-pull-request/59539/
Umm... gcc segfaults on my side when trying to build with asan for test-http2-server-rst-stream.js (that test throws std::bad_alloc):
g++ -o /home/ubuntu/Desktop/node/out/Release/obj.target/gen-regexp-special-case/deps/v8/src/regexp/gen-regexp-special-case.o ../deps/v8/src/regexp/gen-regexp-special-case.cc '-D_GLIBCXX_USE_CXX11_ABI=1' '-DNODE_OPENSSL_CONF_NAME=nodejs_conf' '-DNODE_OPENSSL_HAS_QUIC' '-DICU_NO_USER_DATA_OVERRIDE' '-DV8_GYP_BUILD' '-DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64' '-DLEAK_SANITIZER' '-DV8_USE_ADDRESS_SANITIZER' '-D__STDC_FORMAT_MACROS' '-DOPENSSL_NO_PINSHARED' '-DOPENSSL_THREADS' '-DV8_TARGET_ARCH_X64' '-DV8_HAVE_TARGET_OS' '-DV8_TARGET_OS_LINUX' '-DV8_EMBEDDER_STRING="-node.12"' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT=1' '-DV8_ENABLE_PRIVATE_MAPPING_FORK_OPTIMIZATION' '-DV8_SHORT_BUILTIN_CALLS' '-DOBJECT_PRINT' '-DV8_INTL_SUPPORT' '-DV8_ATOMIC_OBJECT_FIELD_WRITES' '-DV8_ENABLE_LAZY_SOURCE_POSITIONS' '-DV8_USE_SIPHASH' '-DV8_SHARED_RO_HEAP' '-DNDEBUG' '-DV8_WIN64_UNWINDING_INFO' '-DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH' '-DV8_USE_ZLIB' '-DV8_ENABLE_SPARKPLUG' '-DV8_ENABLE_MAGLEV' '-DV8_ENABLE_TURBOFAN' '-DV8_ENABLE_WEBASSEMBLY' '-DV8_ENABLE_JAVASCRIPT_PROMISE_HOOKS' '-DV8_ENABLE_CONTINUATION_PRESERVED_EMBEDDER_DATA' '-DV8_ALLOCATION_FOLDING' '-DV8_ALLOCATION_SITE_TRACKING' '-DV8_ADVANCED_BIGINT_ALGORITHMS' '-DUCONFIG_NO_SERVICE=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=1' '-DUCONFIG_NO_BREAK_ITERATION=0' -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -fno-omit-frame-pointer -fsanitize=address -fsanitize-address-use-after-scope -pthread -Wno-unused-parameter -Wno-strict-overflow -Wno-return-type -Wno-int-in-bool-context -Wno-deprecated -Wno-stringop-overflow -Wno-stringop-overread -Wno-restrict -Wno-array-bounds -Wno-nonnull -Wno-dangling-pointer -flax-vector-conversions -fno-strict-aliasing -m64 -m64 -fno-omit-frame-pointer -fdata-sections -ffunction-sections -O2 -fno-rtti -fno-exceptions -std=gnu++20 -Wno-invalid-offsetof -MMD -MF /home/ubuntu/Desktop/node/out/Release/.deps//home/ubuntu/Desktop/node/out/Release/obj.target/gen-regexp-special-case/deps/v8/src/regexp/gen-regexp-special-case.o.d.raw -c
Segmentation fault (core dumped)
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-4ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Ubuntu 13.2.0-4ubuntu3)
I'll try upgrading Ubuntu (I hate Ubuntu).
Well, after upgrading Ubuntu the resolution is now locked to 1024 x 768 (4:3), while my monitor is 16:9. It looks so stretched and pixelated :( I've been hating Ubuntu since at least 2018 (prolly since 2016) and I don't recommend anyone using it. It breaks randomly (black screen of death), can't set default audio device, can't set default programs for file extensions and bunch of other issues I don't remember.
I'll try some other distro tomorrow. I just used Ubuntu cuz I needed Linux fast some time ago and didn't have the time to set up some bare-bones distro from scratch. I used to daily drive Arch but it started breaking as well (it was throwing fstab errors, black screen, reinstall worked but repeat this 3 times and I got angry again). I never got the Debian installer past the Installing... progress bar, it always got stuck.
Edit: lmao after upgrade it put itself first on the efi boot entry list, what a shitty distro Edit 2: My home server is driving Alpine, so I though I'd use Alpine as well. It turns out since December 2022 NVIDIA forces you to use glibc 🤦🏼
Time to log off for today!
CI: https://ci.nodejs.org/job/node-test-pull-request/59569/
Even with GCC 14.2 the compiler segfaults:
logs
../deps/v8/src/compiler/turboshaft/machine-lowering-phase.cc:31:1: internal compiler error: Segmentation fault
31 | } // namespace v8::internal::compiler::turboshaft
| ^
AddressSanitizer:DEADLYSIGNAL
=================================================================
==12073==ERROR: AddressSanitizer: SEGV on unknown address 0x10007fff8000 (pc 0x559c5152c002 bp 0x7ffde2507120 sp 0x7ffde2506ff0 T0)
==12073==The signal is caused by a READ memory access.
0x7fe0d72068bf ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7fe0d724bfa6 _IO_new_file_xsputn
./libio/fileops.c:1236
0x7fe0d724bfa6 _IO_new_file_xsputn
./libio/fileops.c:1197
0x7fe0d723f926 __GI__IO_fputs
./libio/iofputs.c:38
0x7fe0d71f0c4b __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7fe0d71f0d04 __libc_start_main_impl
../csu/libc-start.c:360
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://github.com/Homebrew/homebrew-core/issues> for instructions.
#0 0x559c5152c002 in std::_Hashtable<v8::internal::torque::Item, v8::internal::torque::Item, std::allocator<v8::internal::torque::Item>, std::__detail::_Identity, std::equal_to<v8::internal::torque::Item>, v8::base::hash<v8::internal::torque::Item>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<v8::internal::torque::Item, true>*, unsigned long) (/home/szmarczak/node/out/Release/torque+0x394002)
#1 0x559c51530006 in v8::internal::torque::RunEarleyAlgorithm(v8::internal::torque::Symbol*, v8::internal::torque::LexerResult const&, std::unordered_set<v8::internal::torque::Item, v8::base::hash<v8::internal::torque::Item>, std::equal_to<v8::internal::torque::Item>, std::allocator<v8::internal::torque::Item> >*) (/home/szmarczak/node/out/Release/torque+0x398006)
#2 0x559c513b00f0 in v8::internal::torque::ParseTorque(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/home/szmarczak/node/out/Release/torque+0x2180f0)
#3 0x559c5128a3a0 in v8::internal::torque::CompileTorque(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, v8::internal::torque::TorqueCompilerOptions) (/home/szmarczak/node/out/Release/torque+0xf23a0)
#4 0x559c51262bd4 in v8::internal::torque::WrappedMain(int, char const**) (/home/szmarczak/node/out/Release/torque+0xcabd4)
#5 0x559c51258008 in main (/home/szmarczak/node/out/Release/torque+0xc0008)
#6 0x7f6fa1a3fc4b in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#7 0x7f6fa1a3fd04 in __libc_start_main_impl ../csu/libc-start.c:360
#8 0x559c51258440 in _start ../sysdeps/x86_64/start.S:115
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/szmarczak/node/out/Release/torque+0x394002) in std::_Hashtable<v8::internal::torque::Item, v8::internal::torque::Item, std::allocator<v8::internal::torque::Item>, std::__detail::_Identity, std::equal_to<v8::internal::torque::Item>, v8::base::hash<v8::internal::torque::Item>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<v8::internal::torque::Item, true>*, unsigned long)
==12073==ABORTING
make[1]: *** [tools/v8_gypfiles/run_torque.target.mk:17: 8f4df544355fcd0a9a2a203eaf6cae85b7846f41.intermediate] Error 1
rm 8f4df544355fcd0a9a2a203eaf6cae85b7846f41.intermediate f7fdbdaa32bca4a9be395232d9d587081a1ca0ee.intermediate ad1f1002e1d464ed9b31278a2e689c1f8df273d2.intermediate
make: *** [Makefile:137: node] Error 2
make: *** Waiting for unfinished jobs....
g++-14 -o /home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/use-map.o ../deps/v8/src/compiler/turboshaft/use-map.cc '-D_GLIBCXX_USE_CXX11_ABI=1' '-DNODE_OPENSSL_CONF_NAME=nodejs_conf' '-DNODE_OPENSSL_HAS_QUIC' '-DICU_NO_USER_DATA_OVERRIDE' '-DV8_GYP_BUILD' '-DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64' '-DLEAK_SANITIZER' '-DV8_USE_ADDRESS_SANITIZER' '-D__STDC_FORMAT_MACROS' '-DOPENSSL_NO_PINSHARED' '-DOPENSSL_THREADS' '-DV8_TARGET_ARCH_X64' '-DV8_HAVE_TARGET_OS' '-DV8_TARGET_OS_LINUX' '-DV8_EMBEDDER_STRING="-node.9"' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT=1' '-DV8_ENABLE_PRIVATE_MAPPING_FORK_OPTIMIZATION' '-DV8_SHORT_BUILTIN_CALLS' '-DOBJECT_PRINT' '-DV8_INTL_SUPPORT' '-DV8_ATOMIC_OBJECT_FIELD_WRITES' '-DV8_ENABLE_LAZY_SOURCE_POSITIONS' '-DV8_USE_SIPHASH' '-DV8_SHARED_RO_HEAP' '-DNDEBUG' '-DV8_WIN64_UNWINDING_INFO' '-DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH' '-DV8_USE_ZLIB' '-DV8_ENABLE_SPARKPLUG' '-DV8_ENABLE_MAGLEV' '-DV8_ENABLE_TURBOFAN' '-DV8_ENABLE_WEBASSEMBLY' '-DV8_ENABLE_JAVASCRIPT_PROMISE_HOOKS' '-DV8_ENABLE_CONTINUATION_PRESERVED_EMBEDDER_DATA' '-DV8_ALLOCATION_FOLDING' '-DV8_ALLOCATION_SITE_TRACKING' '-DV8_ADVANCED_BIGINT_ALGORITHMS' '-DUCONFIG_NO_SERVICE=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=1' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DDEBUG' '-D_DEBUG' '-DENABLE_HANDLE_ZAPPING' -I../deps/v8 -I../deps/v8/include -I/home/szmarczak/node/out/Debug/obj/gen/generate-bytecode-output-root -I/home/szmarczak/node/out/Debug/obj/gen -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/v8/third_party/abseil-cpp -I../deps/v8/third_party/fp16/src/include -fno-omit-frame-pointer -fsanitize=address -fsanitize-address-use-after-scope -pthread -Wno-unused-parameter -Wno-strict-overflow -Wno-return-type -Wno-int-in-bool-context -Wno-deprecated -Wno-stringop-overflow -Wno-stringop-overread -Wno-restrict -Wno-array-bounds -Wno-nonnull -Wno-dangling-pointer -flax-vector-conversions -m64 -m64 -g -fdata-sections -ffunction-sections -O2 -fno-rtti -fno-exceptions -fno-strict-aliasing -std=gnu++20 -Wno-invalid-offsetof -MMD -MF /home/szmarczak/node/out/Debug/.deps//home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/use-map.o.d.raw -c
make[1]: *** [tools/v8_gypfiles/v8_turboshaft.target.mk:250: /home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/machine-lowering-phase.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:137: node_g] Error 2
I'll try with clang.
Ok, clang works but it's 3.20 am here, I'll come back to this in the afternoon.
I just found a race after enabling RST_STREAM and GOAWAY in the same packet. When the server sends both RST_STREAM and a terminating GOAWAY in a single frame, the client might send its GOAWAY just after parsing RST_STREAM because it no longer needs the session, but before acknowledging that there's still a GOAWAY to parse.
Currently the client sends the GOAWAY, but the server doesn't read it, resulting in stall by the client side as it waits for the server to acknowledge the client's GOAWAY packet, which will never happen.
I am yet to figure out a solution to this problem.
Codecov Report
Attention: Patch coverage is 91.61290% with 13 lines in your changes missing coverage. Please review.
Project coverage is 88.41%. Comparing base (
103b843) to head (2f4d66f). Report is 131 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| lib/internal/http2/core.js | 92.24% | 9 Missing :warning: |
| lib/net.js | 90.90% | 2 Missing :warning: |
| src/node_http2.cc | 84.61% | 1 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #53160 +/- ##
==========================================
+ Coverage 88.23% 88.41% +0.18%
==========================================
Files 651 652 +1
Lines 183863 186971 +3108
Branches 35824 36085 +261
==========================================
+ Hits 162235 165317 +3082
+ Misses 14932 14907 -25
- Partials 6696 6747 +51
| Files with missing lines | Coverage Δ | |
|---|---|---|
| lib/_tls_wrap.js | 93.40% <100.00%> (+0.01%) |
:arrow_up: |
| lib/net.js | 95.08% <90.90%> (+0.85%) |
:arrow_up: |
| src/node_http2.cc | 84.33% <84.61%> (-0.82%) |
:arrow_down: |
| lib/internal/http2/core.js | 95.34% <92.24%> (-0.20%) |
:arrow_down: |
If the above is true then I believe it should send TCP RST when it sends GOAWAY and receives a GOAWAY afterwards to prevent stalling.
Hmm... it seems that once the session is closed nghttp2 won't resume parsing the GOAWAY, so we won't know that there was a GOAWAY from the server. In that case there are two options:
- ~~patch nghttp2 to continue parsing only GOAWAYs and send TCP RST only if there was a GOAWAY from the server - I'm -1 on this since there may be erroneous frames in the queue,~~
- always send an RST after sending a terminating GOAWAY - I'm +1 on this one.
The downside with RST is that it won't be retransmitted if it isn't delivered. With FIN instead it would be delivered, but the connection would be stalled as the client's GOAWAY would be never ACKed. So the first could be better...
Also the drawback with the first one is that the GOAWAY does not have to be necessarily in the same packet where the client decides to end the connection. So, to properly do this it would need to still parse frames and ignore all except GOAWAY.
I'll go with always sending an RST after sending a terminating GOAWAY.
Blocked by https://github.com/nodejs/node/issues/49484
Found another double destroy bug:
NET 24900: has server
Trace: disconnect
at Socket._destroy (node:net:846:13)
at _destroy (node:internal/streams/destroy:122:10)
at Socket.destroy (node:internal/streams/destroy:84:5)
at Writable.destroy (node:internal/streams/writable:1122:11)
at TLSWrap.close (node:_tls_wrap:662:24)
at closeSocketHandle (node:net:341:18)
at Socket._destroy (node:net:830:7)
at _destroy (node:internal/streams/destroy:122:10)
at TLSSocket.destroy (node:internal/streams/destroy:84:5)
at Writable.destroy (node:internal/streams/writable:1122:11)
NET 24900: SERVER _emitCloseIfDrained
NET 24900: SERVER handle? true connections? 0
NET 24900: has server
Trace: disconnect
at Socket._destroy (node:net:846:13)
at _destroy (node:internal/streams/destroy:122:10)
at TLSSocket.destroy (node:internal/streams/destroy:84:5)
at Writable.destroy (node:internal/streams/writable:1122:11)
at Server.closeIdleConnections (node:_http_server:606:27)
at Http2SecureServer.closeIdleConnections (node:internal/http2/core:3221:7)
at httpServerPreClose (node:_http_server:527:10)
at Http2SecureServer.close (node:internal/http2/core:3214:7)
at main (/home/szmarczak/node/test/parallel/test-http2-allow-http1.js:44:10)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
NET 24900: SERVER _emitCloseIfDrained
NET 24900: SERVER handle? true connections? -1
will have to fix it in this PR :/
Edit: Actually it shuts down all handles using the same destroy function. It just needed a guard to decrement only at the most low-level handle.
CI: https://ci.nodejs.org/job/node-test-pull-request/62975/
$ NODE_DEBUG=http2 out/Debug/node test/parallel/test-http2-server-shutdown-redundant.js
macOS fails because https://github.com/szmarczak/node/blob/c887bc2aeca56aa1adcd22e84600840172614f2e/lib/internal/stream_base_commons.js#L213 nread is ECONNRESET while on Linux it's EOF.
macOS:
Linux:
I'm not sure why macOS ECONNRESETs here... I'm investigating.
The same on macOS but without TCP RST:
The only difference is that the server FINs instead of client RSTing. So I guess macOS ~~(or libuv?)~~ prefers RST when both FIN and RST are in the recv queue.
Edit: that's correct, if both FIN and RST are in recv queue then it ignores the FIN and emits read ECONNRESET instead of read EOF. I'd say that's a bug in macOS itself.
const net = require('net');
const server = net.createServer({ allowHalfOpen: true }, socket => {}); // macOS: ECONNRESET Linux: pass
server.listen(() => {
const socket = net.connect(server.address().port, '127.0.0.1');
socket.once('connect', () => {
socket.end(() => socket.resetAndDestroy());
});
socket.once('close', () => server.close());
});
Definitely either macOS ~~or libuv~~.
Edit: Windows seems to actually send RST only when the client received data after queueing send RST. It doesn't reset the connection if the server FINs.
const net = require('net');
const server = net.createServer({ allowHalfOpen: true }, socket => {
setTimeout(() => socket.write('passes'), 100);
setTimeout(() => socket.write('fails'), 200);
});
server.listen(() => {
const socket = net.connect(server.address().port, '127.0.0.1');
socket.pipe(process.stdout);
socket.once('connect', () => {
socket.end(() => socket.resetAndDestroy());
});
socket.once('close', () => server.close());
});
If I'm reading libuv right, it seems like a macOS preference? ~~Edit: and possibly Windows?~~ edited comment above
So far Jenkins also says that
node:events:485
throw er; // Unhandled 'error' event
^
Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_CANCEL
at Http2Stream.onStreamClose (node:internal/http2/core:554:15)
Emitted 'error' event on ServerHttp2Stream instance at:
at emitErrorNT (node:internal/streams/destroy:170:8)
at emitErrorCloseNT (node:internal/streams/destroy:129:3)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
code: 'ERR_HTTP2_STREAM_ERROR'
}
Node.js v23.0.0-pre
http2-pipe fails only on latest Fedora (but last-latest passes!) and RHEL 8 (but RHEL 9 passes!)
http2-respond-with-file-connection-abort fails on smartos and
linux-containered -> ubuntu2204_sharedlibs_shared_x64
The above fails on aix too, additionally http2-session-unref fails with a timeout.
http2-server-close-callback also fails on Windows with the same ECONNRESET error in this PR. But only with vs2022, vs2019 passes.
I have an idea how to solve the macOS problem preferring RST in the recv queue. We can just ignore ECONNRESET in socketOnError and let socketOnClose handle it. It does not matter if the endpoint sent TCP FIN prematurely or TCP RST.
Okay, now it should work. However I still don't know why test-http2-session-unref timeouted on aix or test-http2-pipe failed on fedora-latest.
Another question remains: what error to use in the HTTP/2 layer when we get premature close? There's CONNECT_ERROR for CONNECT methods but there's no equivalent for non-CONNECT methods :( Currently the default is NGHTTP2_INTERNAL_ERROR. HTTP/3 has H3_CLOSED_CRITICAL_STREAM but there's no equivalent for HTTP/2
I managed to install latest SmartOS and overcome this and in deps/v8/src/base/platform/platform-tools.cc rename sys/m1man.h to sys/mman.h otherwise it wouldn't compile.
However running make -j 12 I still encounter the GCC compiler bug where it leaks memory and crashes (on some other distros it segfaults). I managed to install clang 18 as I don't have that problem there, but clang refuses to compile with gnu++20 on SmartOS and I have no idea how to install LLVM's libc. I give up manually testing on SmartOS.
aix requires an IBM account and solaris requires an Oracle account and I don't have those. Those aren't freely available so I won't manually test those.
I tested manually on RHEL 8 (Rocky Linux flavor) and it passed, macOS passes as well, I'll test on Fedora latest. If it passes on Fedora latest, I'm fixing that single lint error, the first commit name and it's good to go. If there any other CI errors that I'm not able to reproduce, somebody else would need to take over.