node icon indicating copy to clipboard operation
node copied to clipboard

http2, tls: check content-length, fix RST and GOAWAY logic

Open szmarczak opened this issue 1 year ago • 35 comments

Fixes #35209 Closes #35378

Time spent: 90h

[!CAUTION] These bug fixes are potentially semver-major!

  1. Fixed stream.close(NGHTTP2_CANCEL) closing with 0 aka NGHTTP2_NO_ERROR.
  2. serverStream.destroy() closed with 0, now NGHTTP2_INTERNAL_ERROR.
  3. clientStream.destroy() closed with 0, now NGHTTP2_CANCEL.
  4. Mismatched content-length now throws NGHTTP2_PROTOCOL_ERROR as according to the spec.
  5. Fixed GOAWAY (server -> client) being greeted with GOAWAY (client -> server) as it's against the spec.
  6. Fixed streams being always closed with NGHTTP2_CANCEL on socket close, now respects goawayCode and destroyCode. The default is NGHTTP2_INTERNAL_ERROR for premature TCP FIN and TCP RST.
  7. Fixed stream.rstCode being 0 while active - docs say it should be undefined.
  8. Fixed preferring sessionCode over rstCode when destroying a stream with an error.
  9. Fixed streams being closed with NO_ERROR if session received GOAWAY with NO_ERROR and remote closed connection.
  10. Fixed streams being closed with NO_ERROR if session sent GOAWAY with NO_ERROR and destroyed.
  11. Fixed GOAWAY preventing RST_STREAM from being sent before GOAWAY. nghttp2 correctly assumes that it should prevent RST_STREAM from being sent but incorretly applies it to all frames in a packet instead of frames defined after GOAWAY. This is not the only thing that nghttp2 does wrong:
    • https://github.com/nghttp2/nghttp2/issues/1166
    • https://github.com/nghttp2/nghttp2/issues/692
  12. Fixed benchmark/http2/headers.js calling client.destroy() (resulting in dropped requests). Now calls client.close() which closes gracefully.
  13. connectStreamSocket.destroy() now closes with NGHTTP2_CONNECT_ERROR.
  14. Fixed double GOAWAY on session.close().
  15. Fixed queued RST_STREAM and GOAWAY being dropped if Http2Session::Close() and previous writes weren't finished yet.
  16. Fixed stream frame errors closing the entire session instead of just the stream.
  17. Fixed stream frame errors sending RST_STREAM on idle streams (to reproduce 16. needs to be fixed).
  18. Fixed https://github.com/nodejs/node/issues/49484
  19. Fixed server close not being emitted if TLS socket RSTs
  20. Fixed deadlock when sending GOAWAY before receiving server's GOAWAY. It won't gracefully close because the server may never ACK our GOAWAY. Now it sends TCP RST after terminating GOAWAY.

Sorry so many bugs are fixed in a single PR but it's impossible to fix one without fixing them all!

Best if https://github.com/nghttp2/nghttp2/pull/1508 got merged before this, but it has been open for years 😢 Hence the patch for nghttp2_session.c


/cc @jasnell

szmarczak avatar May 26 '24 05:05 szmarczak

Review requested:

  • [ ] @nodejs/http
  • [ ] @nodejs/http2
  • [ ] @nodejs/net
  • [ ] @nodejs/security-wg

nodejs-github-bot avatar May 26 '24 05:05 nodejs-github-bot

The linter and commit validator are failing, could you check?

mcollina avatar May 26 '24 07:05 mcollina

An HTTP response is complete after the server sends -- or the client receives -- a frame with the END_STREAM flag set (including any CONTINUATION frames needed to complete a field block). A server can send a complete response prior to the client sending an entire request if the response does not depend on any portion of the request that has not been sent and received. When this is true, a server MAY request that the client abort transmission of a request without error by sending a RST_STREAM with an error code of NO_ERROR after sending a complete response (i.e., a frame with the END_STREAM flag set). Clients MUST NOT discard responses as a result of receiving such a RST_STREAM, though clients can always discard responses at their discretion for other reasons.

That's interesting. RST_STREAM with NO_ERROR means: don't send anything to me, but process what I sent you. Rapid reset? (good it's fixed in nghttp2)

szmarczak avatar May 29 '24 09:05 szmarczak

@jasnell could you reach out to the nghttp2 maintainers and ask about the status of that PR? I would sincerely hope we wouldn't have to maintain a floating patch.

mcollina avatar May 29 '24 12:05 mcollina

CI: https://ci.nodejs.org/job/node-test-pull-request/59515/

nodejs-github-bot avatar May 29 '24 12:05 nodejs-github-bot

I got stuck. test-http2-multistream-destroy-on-read-tls.js doesn't receive last RST for /j. This happens only on TLS. I used the keylog TLS event to get the key and I used Wireshark to inspect the traffic, and I was amazed to see this:

Screenshot 2024-05-29 at 20 00 37

Aside the missing RST, there's no GOAWAY frame from the client. Even if I restore node_http2.cc:794 to be

CHECK_EQ(nghttp2_session_terminate_session(session_.get(), code), 0);

the result is the same - no GOAWAY frame. I don't know how nghttp2 builds frames, nor how the streams work internally so I would greatly appreciate any help here!

szmarczak avatar May 29 '24 18:05 szmarczak

~~After digging it looks like there's a bug in the queue mechanism in nghttp2?~~

Edit: For some reason the session addresses differ? Investigating! (I won't ping nghttp2 yet till I know it's definitely in nghttp2)

PUSH RST_STREAM TO QUEUE session=0x5a831ddf2128 ob_reg=0x5a831ddf2240 item=0x5a831dc43b88
pointer 0x5a831dc43b88
to head
node goaway, nghttp2 session=0x5a831ddf2128
pop session=0x5a831de25c08 ob_reg=0x5a831de25d20 top=(nil)

szmarczak avatar May 30 '24 11:05 szmarczak

Have you tried pinging nghttp2?

mcollina avatar May 30 '24 11:05 mcollina

Ok I know what's happening:

submit rst stream
adding rst stream
PUSH RST_STREAM TO QUEUE session=0x60cd23c980e8 ob_reg=0x60cd23c98200 item=0x60cd23979b88
pointer 0x60cd23979b88
to head
goaway
node goaway, nghttp2 session=0x60cd23c980e8
sending pending data session=0x60cd23c980e8
sending, returning
write finished
sending pending data session=0x60cd23c980e8
destroyed, returning
sending pending data session=0x60cd23c980e8
destroyed, returning

It's a bug in Node.js. It can't send the next RST_STREAM & GOAWAY, because it's sending the previous ones. Relevant code:

  if (!socket_closed) {
    Debug(this, "terminating session with code %d", code);
    CHECK_EQ(nghttp2_session_terminate_session(session_.get(), code), 0);
    SendPendingData(); // fails because it's already sending the previous frames
  } else if (stream_ != nullptr) {
    stream_->RemoveStreamListener(this);
  }


  set_destroyed(); // gets destroyed before finished writing
  // queued writes aren't sent because it got destroyed above, so OnStreamAfterWrite -> MaybeScheduleWrite never runs because the stream is destroyed!

I think I know how to fix it.

szmarczak avatar May 30 '24 12:05 szmarczak

CI: https://ci.nodejs.org/job/node-test-pull-request/59539/

nodejs-github-bot avatar May 30 '24 13:05 nodejs-github-bot

Umm... gcc segfaults on my side when trying to build with asan for test-http2-server-rst-stream.js (that test throws std::bad_alloc):

g++ -o /home/ubuntu/Desktop/node/out/Release/obj.target/gen-regexp-special-case/deps/v8/src/regexp/gen-regexp-special-case.o ../deps/v8/src/regexp/gen-regexp-special-case.cc '-D_GLIBCXX_USE_CXX11_ABI=1' '-DNODE_OPENSSL_CONF_NAME=nodejs_conf' '-DNODE_OPENSSL_HAS_QUIC' '-DICU_NO_USER_DATA_OVERRIDE' '-DV8_GYP_BUILD' '-DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64' '-DLEAK_SANITIZER' '-DV8_USE_ADDRESS_SANITIZER' '-D__STDC_FORMAT_MACROS' '-DOPENSSL_NO_PINSHARED' '-DOPENSSL_THREADS' '-DV8_TARGET_ARCH_X64' '-DV8_HAVE_TARGET_OS' '-DV8_TARGET_OS_LINUX' '-DV8_EMBEDDER_STRING="-node.12"' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT=1' '-DV8_ENABLE_PRIVATE_MAPPING_FORK_OPTIMIZATION' '-DV8_SHORT_BUILTIN_CALLS' '-DOBJECT_PRINT' '-DV8_INTL_SUPPORT' '-DV8_ATOMIC_OBJECT_FIELD_WRITES' '-DV8_ENABLE_LAZY_SOURCE_POSITIONS' '-DV8_USE_SIPHASH' '-DV8_SHARED_RO_HEAP' '-DNDEBUG' '-DV8_WIN64_UNWINDING_INFO' '-DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH' '-DV8_USE_ZLIB' '-DV8_ENABLE_SPARKPLUG' '-DV8_ENABLE_MAGLEV' '-DV8_ENABLE_TURBOFAN' '-DV8_ENABLE_WEBASSEMBLY' '-DV8_ENABLE_JAVASCRIPT_PROMISE_HOOKS' '-DV8_ENABLE_CONTINUATION_PRESERVED_EMBEDDER_DATA' '-DV8_ALLOCATION_FOLDING' '-DV8_ALLOCATION_SITE_TRACKING' '-DV8_ADVANCED_BIGINT_ALGORITHMS' '-DUCONFIG_NO_SERVICE=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=1' '-DUCONFIG_NO_BREAK_ITERATION=0' -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common  -fno-omit-frame-pointer -fsanitize=address -fsanitize-address-use-after-scope -pthread -Wno-unused-parameter -Wno-strict-overflow -Wno-return-type -Wno-int-in-bool-context -Wno-deprecated -Wno-stringop-overflow -Wno-stringop-overread -Wno-restrict -Wno-array-bounds -Wno-nonnull -Wno-dangling-pointer -flax-vector-conversions -fno-strict-aliasing -m64 -m64 -fno-omit-frame-pointer -fdata-sections -ffunction-sections -O2 -fno-rtti -fno-exceptions -std=gnu++20 -Wno-invalid-offsetof -MMD -MF /home/ubuntu/Desktop/node/out/Release/.deps//home/ubuntu/Desktop/node/out/Release/obj.target/gen-regexp-special-case/deps/v8/src/regexp/gen-regexp-special-case.o.d.raw   -c
Segmentation fault (core dumped)
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-4ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Ubuntu 13.2.0-4ubuntu3)

I'll try upgrading Ubuntu (I hate Ubuntu).

szmarczak avatar May 30 '24 19:05 szmarczak

Well, after upgrading Ubuntu the resolution is now locked to 1024 x 768 (4:3), while my monitor is 16:9. It looks so stretched and pixelated :( I've been hating Ubuntu since at least 2018 (prolly since 2016) and I don't recommend anyone using it. It breaks randomly (black screen of death), can't set default audio device, can't set default programs for file extensions and bunch of other issues I don't remember.

I'll try some other distro tomorrow. I just used Ubuntu cuz I needed Linux fast some time ago and didn't have the time to set up some bare-bones distro from scratch. I used to daily drive Arch but it started breaking as well (it was throwing fstab errors, black screen, reinstall worked but repeat this 3 times and I got angry again). I never got the Debian installer past the Installing... progress bar, it always got stuck.

Edit: lmao after upgrade it put itself first on the efi boot entry list, what a shitty distro Edit 2: My home server is driving Alpine, so I though I'd use Alpine as well. It turns out since December 2022 NVIDIA forces you to use glibc 🤦🏼

Time to log off for today!

szmarczak avatar May 30 '24 20:05 szmarczak

CI: https://ci.nodejs.org/job/node-test-pull-request/59569/

nodejs-github-bot avatar May 31 '24 14:05 nodejs-github-bot

Even with GCC 14.2 the compiler segfaults:

logs
../deps/v8/src/compiler/turboshaft/machine-lowering-phase.cc:31:1: internal compiler error: Segmentation fault
   31 | }  // namespace v8::internal::compiler::turboshaft
      | ^
AddressSanitizer:DEADLYSIGNAL
=================================================================
==12073==ERROR: AddressSanitizer: SEGV on unknown address 0x10007fff8000 (pc 0x559c5152c002 bp 0x7ffde2507120 sp 0x7ffde2506ff0 T0)
==12073==The signal is caused by a READ memory access.
0x7fe0d72068bf ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7fe0d724bfa6 _IO_new_file_xsputn
	./libio/fileops.c:1236
0x7fe0d724bfa6 _IO_new_file_xsputn
	./libio/fileops.c:1197
0x7fe0d723f926 __GI__IO_fputs
	./libio/iofputs.c:38
0x7fe0d71f0c4b __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7fe0d71f0d04 __libc_start_main_impl
	../csu/libc-start.c:360
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://github.com/Homebrew/homebrew-core/issues> for instructions.
    #0 0x559c5152c002 in std::_Hashtable<v8::internal::torque::Item, v8::internal::torque::Item, std::allocator<v8::internal::torque::Item>, std::__detail::_Identity, std::equal_to<v8::internal::torque::Item>, v8::base::hash<v8::internal::torque::Item>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<v8::internal::torque::Item, true>*, unsigned long) (/home/szmarczak/node/out/Release/torque+0x394002)
    #1 0x559c51530006 in v8::internal::torque::RunEarleyAlgorithm(v8::internal::torque::Symbol*, v8::internal::torque::LexerResult const&, std::unordered_set<v8::internal::torque::Item, v8::base::hash<v8::internal::torque::Item>, std::equal_to<v8::internal::torque::Item>, std::allocator<v8::internal::torque::Item> >*) (/home/szmarczak/node/out/Release/torque+0x398006)
    #2 0x559c513b00f0 in v8::internal::torque::ParseTorque(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/home/szmarczak/node/out/Release/torque+0x2180f0)
    #3 0x559c5128a3a0 in v8::internal::torque::CompileTorque(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, v8::internal::torque::TorqueCompilerOptions) (/home/szmarczak/node/out/Release/torque+0xf23a0)
    #4 0x559c51262bd4 in v8::internal::torque::WrappedMain(int, char const**) (/home/szmarczak/node/out/Release/torque+0xcabd4)
    #5 0x559c51258008 in main (/home/szmarczak/node/out/Release/torque+0xc0008)
    #6 0x7f6fa1a3fc4b in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #7 0x7f6fa1a3fd04 in __libc_start_main_impl ../csu/libc-start.c:360
    #8 0x559c51258440 in _start ../sysdeps/x86_64/start.S:115

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/szmarczak/node/out/Release/torque+0x394002) in std::_Hashtable<v8::internal::torque::Item, v8::internal::torque::Item, std::allocator<v8::internal::torque::Item>, std::__detail::_Identity, std::equal_to<v8::internal::torque::Item>, v8::base::hash<v8::internal::torque::Item>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<v8::internal::torque::Item, true>*, unsigned long)
==12073==ABORTING
make[1]: *** [tools/v8_gypfiles/run_torque.target.mk:17: 8f4df544355fcd0a9a2a203eaf6cae85b7846f41.intermediate] Error 1
rm 8f4df544355fcd0a9a2a203eaf6cae85b7846f41.intermediate f7fdbdaa32bca4a9be395232d9d587081a1ca0ee.intermediate ad1f1002e1d464ed9b31278a2e689c1f8df273d2.intermediate
make: *** [Makefile:137: node] Error 2
make: *** Waiting for unfinished jobs....
  g++-14 -o /home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/use-map.o ../deps/v8/src/compiler/turboshaft/use-map.cc '-D_GLIBCXX_USE_CXX11_ABI=1' '-DNODE_OPENSSL_CONF_NAME=nodejs_conf' '-DNODE_OPENSSL_HAS_QUIC' '-DICU_NO_USER_DATA_OVERRIDE' '-DV8_GYP_BUILD' '-DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64' '-DLEAK_SANITIZER' '-DV8_USE_ADDRESS_SANITIZER' '-D__STDC_FORMAT_MACROS' '-DOPENSSL_NO_PINSHARED' '-DOPENSSL_THREADS' '-DV8_TARGET_ARCH_X64' '-DV8_HAVE_TARGET_OS' '-DV8_TARGET_OS_LINUX' '-DV8_EMBEDDER_STRING="-node.9"' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT=1' '-DV8_ENABLE_PRIVATE_MAPPING_FORK_OPTIMIZATION' '-DV8_SHORT_BUILTIN_CALLS' '-DOBJECT_PRINT' '-DV8_INTL_SUPPORT' '-DV8_ATOMIC_OBJECT_FIELD_WRITES' '-DV8_ENABLE_LAZY_SOURCE_POSITIONS' '-DV8_USE_SIPHASH' '-DV8_SHARED_RO_HEAP' '-DNDEBUG' '-DV8_WIN64_UNWINDING_INFO' '-DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH' '-DV8_USE_ZLIB' '-DV8_ENABLE_SPARKPLUG' '-DV8_ENABLE_MAGLEV' '-DV8_ENABLE_TURBOFAN' '-DV8_ENABLE_WEBASSEMBLY' '-DV8_ENABLE_JAVASCRIPT_PROMISE_HOOKS' '-DV8_ENABLE_CONTINUATION_PRESERVED_EMBEDDER_DATA' '-DV8_ALLOCATION_FOLDING' '-DV8_ALLOCATION_SITE_TRACKING' '-DV8_ADVANCED_BIGINT_ALGORITHMS' '-DUCONFIG_NO_SERVICE=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=1' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DDEBUG' '-D_DEBUG' '-DENABLE_HANDLE_ZAPPING' -I../deps/v8 -I../deps/v8/include -I/home/szmarczak/node/out/Debug/obj/gen/generate-bytecode-output-root -I/home/szmarczak/node/out/Debug/obj/gen -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/v8/third_party/abseil-cpp -I../deps/v8/third_party/fp16/src/include  -fno-omit-frame-pointer -fsanitize=address -fsanitize-address-use-after-scope -pthread -Wno-unused-parameter -Wno-strict-overflow -Wno-return-type -Wno-int-in-bool-context -Wno-deprecated -Wno-stringop-overflow -Wno-stringop-overread -Wno-restrict -Wno-array-bounds -Wno-nonnull -Wno-dangling-pointer -flax-vector-conversions -m64 -m64 -g -fdata-sections -ffunction-sections -O2 -fno-rtti -fno-exceptions -fno-strict-aliasing -std=gnu++20 -Wno-invalid-offsetof -MMD -MF /home/szmarczak/node/out/Debug/.deps//home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/use-map.o.d.raw   -c
make[1]: *** [tools/v8_gypfiles/v8_turboshaft.target.mk:250: /home/szmarczak/node/out/Debug/obj.target/v8_turboshaft/deps/v8/src/compiler/turboshaft/machine-lowering-phase.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:137: node_g] Error 2

I'll try with clang.

szmarczak avatar Sep 30 '24 21:09 szmarczak

Ok, clang works but it's 3.20 am here, I'll come back to this in the afternoon.

szmarczak avatar Oct 01 '24 01:10 szmarczak

I just found a race after enabling RST_STREAM and GOAWAY in the same packet. When the server sends both RST_STREAM and a terminating GOAWAY in a single frame, the client might send its GOAWAY just after parsing RST_STREAM because it no longer needs the session, but before acknowledging that there's still a GOAWAY to parse.

Currently the client sends the GOAWAY, but the server doesn't read it, resulting in stall by the client side as it waits for the server to acknowledge the client's GOAWAY packet, which will never happen.

I am yet to figure out a solution to this problem.

szmarczak avatar Oct 04 '24 10:10 szmarczak

Codecov Report

Attention: Patch coverage is 91.61290% with 13 lines in your changes missing coverage. Please review.

Project coverage is 88.41%. Comparing base (103b843) to head (2f4d66f). Report is 131 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/http2/core.js 92.24% 9 Missing :warning:
lib/net.js 90.90% 2 Missing :warning:
src/node_http2.cc 84.61% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #53160      +/-   ##
==========================================
+ Coverage   88.23%   88.41%   +0.18%     
==========================================
  Files         651      652       +1     
  Lines      183863   186971    +3108     
  Branches    35824    36085     +261     
==========================================
+ Hits       162235   165317    +3082     
+ Misses      14932    14907      -25     
- Partials     6696     6747      +51     
Files with missing lines Coverage Δ
lib/_tls_wrap.js 93.40% <100.00%> (+0.01%) :arrow_up:
lib/net.js 95.08% <90.90%> (+0.85%) :arrow_up:
src/node_http2.cc 84.33% <84.61%> (-0.82%) :arrow_down:
lib/internal/http2/core.js 95.34% <92.24%> (-0.20%) :arrow_down:

... and 128 files with indirect coverage changes

codecov[bot] avatar Oct 04 '24 11:10 codecov[bot]

If the above is true then I believe it should send TCP RST when it sends GOAWAY and receives a GOAWAY afterwards to prevent stalling.

szmarczak avatar Oct 04 '24 13:10 szmarczak

Hmm... it seems that once the session is closed nghttp2 won't resume parsing the GOAWAY, so we won't know that there was a GOAWAY from the server. In that case there are two options:

  1. ~~patch nghttp2 to continue parsing only GOAWAYs and send TCP RST only if there was a GOAWAY from the server - I'm -1 on this since there may be erroneous frames in the queue,~~
  2. always send an RST after sending a terminating GOAWAY - I'm +1 on this one.

The downside with RST is that it won't be retransmitted if it isn't delivered. With FIN instead it would be delivered, but the connection would be stalled as the client's GOAWAY would be never ACKed. So the first could be better...

szmarczak avatar Oct 04 '24 14:10 szmarczak

Also the drawback with the first one is that the GOAWAY does not have to be necessarily in the same packet where the client decides to end the connection. So, to properly do this it would need to still parse frames and ignore all except GOAWAY.

I'll go with always sending an RST after sending a terminating GOAWAY.

szmarczak avatar Oct 05 '24 10:10 szmarczak

Blocked by https://github.com/nodejs/node/issues/49484

szmarczak avatar Oct 06 '24 10:10 szmarczak

Found another double destroy bug:

NET 24900: has server
Trace: disconnect
    at Socket._destroy (node:net:846:13)
    at _destroy (node:internal/streams/destroy:122:10)
    at Socket.destroy (node:internal/streams/destroy:84:5)
    at Writable.destroy (node:internal/streams/writable:1122:11)
    at TLSWrap.close (node:_tls_wrap:662:24)
    at closeSocketHandle (node:net:341:18)
    at Socket._destroy (node:net:830:7)
    at _destroy (node:internal/streams/destroy:122:10)
    at TLSSocket.destroy (node:internal/streams/destroy:84:5)
    at Writable.destroy (node:internal/streams/writable:1122:11)
NET 24900: SERVER _emitCloseIfDrained
NET 24900: SERVER handle? true   connections? 0
NET 24900: has server
Trace: disconnect
    at Socket._destroy (node:net:846:13)
    at _destroy (node:internal/streams/destroy:122:10)
    at TLSSocket.destroy (node:internal/streams/destroy:84:5)
    at Writable.destroy (node:internal/streams/writable:1122:11)
    at Server.closeIdleConnections (node:_http_server:606:27)
    at Http2SecureServer.closeIdleConnections (node:internal/http2/core:3221:7)
    at httpServerPreClose (node:_http_server:527:10)
    at Http2SecureServer.close (node:internal/http2/core:3214:7)
    at main (/home/szmarczak/node/test/parallel/test-http2-allow-http1.js:44:10)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
NET 24900: SERVER _emitCloseIfDrained
NET 24900: SERVER handle? true   connections? -1

will have to fix it in this PR :/

Edit: Actually it shuts down all handles using the same destroy function. It just needed a guard to decrement only at the most low-level handle.

szmarczak avatar Oct 06 '24 14:10 szmarczak

CI: https://ci.nodejs.org/job/node-test-pull-request/62975/

nodejs-github-bot avatar Oct 07 '24 16:10 nodejs-github-bot

$ NODE_DEBUG=http2 out/Debug/node test/parallel/test-http2-server-shutdown-redundant.js

macOS fails because https://github.com/szmarczak/node/blob/c887bc2aeca56aa1adcd22e84600840172614f2e/lib/internal/stream_base_commons.js#L213 nread is ECONNRESET while on Linux it's EOF.

macOS: image

Linux: image

I'm not sure why macOS ECONNRESETs here... I'm investigating.

szmarczak avatar Oct 08 '24 01:10 szmarczak

The same on macOS but without TCP RST: image

The only difference is that the server FINs instead of client RSTing. So I guess macOS ~~(or libuv?)~~ prefers RST when both FIN and RST are in the recv queue.

Edit: that's correct, if both FIN and RST are in recv queue then it ignores the FIN and emits read ECONNRESET instead of read EOF. I'd say that's a bug in macOS itself.

szmarczak avatar Oct 08 '24 01:10 szmarczak

const net = require('net');
const server = net.createServer({ allowHalfOpen: true }, socket => {}); // macOS: ECONNRESET Linux: pass
server.listen(() => {
	const socket = net.connect(server.address().port, '127.0.0.1');
	socket.once('connect', () => {
		socket.end(() => socket.resetAndDestroy());
	});
	socket.once('close', () => server.close());
});

Definitely either macOS ~~or libuv~~.

Edit: Windows seems to actually send RST only when the client received data after queueing send RST. It doesn't reset the connection if the server FINs.

const net = require('net');
const server = net.createServer({ allowHalfOpen: true }, socket => {
	setTimeout(() => socket.write('passes'), 100);
	setTimeout(() => socket.write('fails'), 200);
});

server.listen(() => {
	const socket = net.connect(server.address().port, '127.0.0.1');
	socket.pipe(process.stdout);

	socket.once('connect', () => {
		socket.end(() => socket.resetAndDestroy());
	});
	socket.once('close', () => server.close());
});

szmarczak avatar Oct 08 '24 01:10 szmarczak

If I'm reading libuv right, it seems like a macOS preference? ~~Edit: and possibly Windows?~~ edited comment above


So far Jenkins also says that

  node:events:485
        throw er; // Unhandled 'error' event
        ^

  Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_CANCEL
      at Http2Stream.onStreamClose (node:internal/http2/core:554:15)
  Emitted 'error' event on ServerHttp2Stream instance at:
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    code: 'ERR_HTTP2_STREAM_ERROR'
  }

  Node.js v23.0.0-pre

http2-pipe fails only on latest Fedora (but last-latest passes!) and RHEL 8 (but RHEL 9 passes!)

http2-respond-with-file-connection-abort fails on smartos and
linux-containered -> ubuntu2204_sharedlibs_shared_x64

The above fails on aix too, additionally http2-session-unref fails with a timeout.

http2-server-close-callback also fails on Windows with the same ECONNRESET error in this PR. But only with vs2022, vs2019 passes.

szmarczak avatar Oct 08 '24 02:10 szmarczak

I have an idea how to solve the macOS problem preferring RST in the recv queue. We can just ignore ECONNRESET in socketOnError and let socketOnClose handle it. It does not matter if the endpoint sent TCP FIN prematurely or TCP RST.

szmarczak avatar Oct 08 '24 12:10 szmarczak

Okay, now it should work. However I still don't know why test-http2-session-unref timeouted on aix or test-http2-pipe failed on fedora-latest.

Another question remains: what error to use in the HTTP/2 layer when we get premature close? There's CONNECT_ERROR for CONNECT methods but there's no equivalent for non-CONNECT methods :( Currently the default is NGHTTP2_INTERNAL_ERROR. HTTP/3 has H3_CLOSED_CRITICAL_STREAM but there's no equivalent for HTTP/2

szmarczak avatar Oct 08 '24 13:10 szmarczak

I managed to install latest SmartOS and overcome this and in deps/v8/src/base/platform/platform-tools.cc rename sys/m1man.h to sys/mman.h otherwise it wouldn't compile.

However running make -j 12 I still encounter the GCC compiler bug where it leaks memory and crashes (on some other distros it segfaults). I managed to install clang 18 as I don't have that problem there, but clang refuses to compile with gnu++20 on SmartOS and I have no idea how to install LLVM's libc. I give up manually testing on SmartOS.

aix requires an IBM account and solaris requires an Oracle account and I don't have those. Those aren't freely available so I won't manually test those.

I tested manually on RHEL 8 (Rocky Linux flavor) and it passed, macOS passes as well, I'll test on Fedora latest. If it passes on Fedora latest, I'm fixing that single lint error, the first commit name and it's good to go. If there any other CI errors that I'm not able to reproduce, somebody else would need to take over.

szmarczak avatar Oct 10 '24 19:10 szmarczak