node icon indicating copy to clipboard operation
node copied to clipboard

Flaky `test-stream-readable-unpipe-resume`

Open targos opened this issue 1 year ago • 9 comments

Test

test-stream-readable-unpipe-resume

Platform

Linux x64

Console output

14:09:40 not ok 3695 parallel/test-stream-readable-unpipe-resume
14:09:40   ---
14:09:40   duration_ms: 120099.24600
14:09:40   severity: fail
14:09:40   exitcode: -15
14:09:40   stack: |-
14:09:40     timeout
14:09:40   ...

Build links

  • https://ci.nodejs.org/job/node-test-commit-linux-containered/44931/nodes=ubuntu2204_sharedlibs_openssl31_x64/console

Additional information

No response

targos avatar Jul 31 '24 06:07 targos

Stress test: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/520/

targos avatar Aug 01 '24 08:08 targos

I can reproduce on Ubuntu.

diff --git a/test/parallel/test-stream-readable-unpipe-resume.js b/test/parallel/test-stream-readable-unpipe-resume.js
index b40f724bcc..3eeae52f5f 100644
--- a/test/parallel/test-stream-readable-unpipe-resume.js
+++ b/test/parallel/test-stream-readable-unpipe-resume.js
@@ -13,7 +13,9 @@ const transformStream = new stream.Transform({
   })
 });
 
-readStream.on('end', common.mustCall());
+readStream.on('end', common.mustCall(function () {
+  console.log('end');
+}));
 
 readStream
   .pipe(transformStream)

$ ./tools/test.py --repeat=10000 test/parallel/test-stream-readable-unpipe-resume
=== release test-stream-readable-unpipe-resume ===
Path: parallel/test-stream-readable-unpipe-resume
end
Command: out/Release/node /home/luigi/node/test/parallel/test-stream-readable-unpipe-resume.js
--- TIMEOUT ---


[05:21|% 100|+ 9999|-   1]: Done

Failed tests:
out/Release/node /home/luigi/node/test/parallel/test-stream-readable-unpipe-resume.js

This is yet another case (see https://github.com/nodejs/node/issues/52550#issuecomment-2343204697) where the test correctly finishes (note "end" in the output) but the process does not exit.

lpinca avatar Sep 11 '24 11:09 lpinca

I wish we were able to reproduce these more reliably, because I don't know how to debug something that only happens once or twice every 10k runs!

targos avatar Sep 11 '24 16:09 targos

It might be only a coincidence or a reduced failure rate, but even in this case I get no failures with the --jitless flag.

lpinca avatar Sep 11 '24 16:09 lpinca

Hypothetically, as a short-term solution, we could call process.exit() from these tests, but that acts like a bandaid, and not a patch.

- readStream.on('end', common.mustCall());
+ readStream.on('end', common.mustCall(() => process.exit()));

avivkeller avatar Oct 15 '24 00:10 avivkeller

@RedYetiDev it does not help, see https://github.com/nodejs/node/issues/52964#issuecomment-2106317993.

lpinca avatar Oct 15 '24 06:10 lpinca

Insterestingly, it seems to not fail if you change the line to:

readStream.on('end', common.mustCall(() => queueMicrotask(process.exit)));

I ran 10,000 runs with this change: 0 failures I ran 2,000 runs without this change: 4 failures

Although this could be a coincidence?

avivkeller avatar Oct 17 '24 18:10 avivkeller

Although this could be a coincidence?

Yes, I think so. For example, this

diff --git a/test/parallel/test-net-write-fully-async-hex-string.js b/test/parallel/test-net-write-fully-async-hex-string.js
index 37b5cd75c1..b37f2acefc 100644
--- a/test/parallel/test-net-write-fully-async-hex-string.js
+++ b/test/parallel/test-net-write-fully-async-hex-string.js
@@ -30,3 +30,7 @@ const server = net.createServer(common.mustCall(function(conn) {
     writeLoop();
   }));
 }));
+
+server.on('close', function () {
+  queueMicrotask(process.exit);
+});

makes no difference.

lpinca avatar Oct 17 '24 18:10 lpinca

This whole issue is strange. I wonder what task is hanging the process :thinking:

avivkeller avatar Oct 17 '24 18:10 avivkeller