Since v20.13.0 fetch() ing small files without reading their body delays process termination for 8 seconds
Version
v20.13.0
Platform
Linux 33f5daf61896 6.4.16-linuxkit #1 SMP PREEMPT Thu Nov 16 10:49:20 UTC 2023 aarch64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
docker run --rm -ti node:20.13.0 /bin/bash
cat > test.js
async function main() {
const x = await fetch ('http://cms6-sites.webhare.com/test-arnold/empty.docx');
console.log(x);
}
main()
^D
node test.js
The node process will not terminate after downloading the document
How often does it reproduce? Is there a required condition?
100%
What is the expected behavior? Why is that the expected behavior?
Process should terminate
What do you see instead?
Process does not terminate
Additional information
- downgrading to 20.12.2 fixes the issue
- the reported uname above is on a Mac M3 with Linux inside docker, but I can reproduce it on native x86_64 linux hosts too
- it does not fail when the document is served under a different mimetype (eg try with http://cms6-sites.webhare.com/test-arnold/empty.docx)
- it also hangs when the URL is changed to https
- I can also reproduce it with other docx files
Edit: fixed version number where it still works, there is no 20.12.3, oops.
I see connection timeout instead of hanging with 20.13.0 and 22.1.0.
The problem is not related to fetch but more fundamental network connection.
You can see the same happen on http.request.
import { request } from 'http'
request('http://cms6-sites.webhare.com/test-arnold/empty.docx', (response) => {
console.log(response)
response.resume()
}).end()
more fundamental network connection
Possibly. To exclude the issue with any servers I control (although I originally found the issue on localhost with a different server setup) I've also tested against external servers.
It seems file size related. now with this code
async function main() {
const x = await fetch (process.argv[2]);
console.log(x);
}
main()
These are the results I see now, where Fails means 'it downloads but the process never terminates'
To confirm it's not my server:
# FAILS: 20034 bytes, completely different server I know nothing about
node test.js https://www.mhlw.go.jp/content/11120000/000908344.docx
# FAILS: 20306 bytes
node test.js https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
# Works: 92682 bytes
node test.js https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx
To demonstrate it appears to be size related (all these docs are random data)
# Works: 164864 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
# Fails: 81920 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
# Fails: 93184 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx
# Works: 94208 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx
And it's not the file size per se but seems to have to do with the request decoding
# Works: 20608 but this file got gzipped by nginx as it didn't recgnoze the extension, and transferred as chunked
node test.js http://cms6-sites.webhare.com/test-arnold/empty.stuff
As far as I can tell, docs < 80KB fail, > 160 KB work, and I can confirm this with a few random servers found in google. For me the exact cutoff point is somewhere between 91KB and 92KB but if it's network related, the exact cutoff might differ for others
EDIT: now also tested from a Rocky 9 host at Linode, ie in a DC, which seems to cut out somewhere between 120KB and 160KB
It is a bit different from my environment.
All of the link from http://cms6-sites.webhare.com result in network timeout.
But all the others works fine.
The address I connected to for cms6-sites.webhare.com are
104.248.92.95 and 2a03:b0c0:2:f0::a4:5001
import { request as http } from 'http';
import { request as https } from 'https';
import { fetch } from 'undici';
const sites = [
'http://cms6-sites.webhare.com/test-arnold/empty.docx',
'http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx',
'http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx',
'http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx',
'http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx',
'https://www.mhlw.go.jp/content/11120000/000908344.docx',
'https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx',
'https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx'
]
for(const site of sites) {
_fetch(site)
.then(() => { console.log('fetch done: %s', site) })
.catch((err) => { console.log('fetch fail: %s', site) })
_request(site)
.then(() => { console.log('request done: %s', site) })
.catch((err) => { console.log('request fail: %s', site) })
}
function _fetch(href) {
return fetch (href)
}
function _request(href) {
return new Promise((resolve, reject) => {
const request = href.startsWith('https') ? https : http
request(href, (response) => {
resolve(response)
response.resume()
}).on('error', (err) => {
reject(err)
}).end()
})
}
request done: https://www.mhlw.go.jp/content/11120000/000908344.docx
fetch done: https://www.mhlw.go.jp/content/11120000/000908344.docx
request done: https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx
fetch done: https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx
request done: https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
fetch done: https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx
The problem related to the auto select family attempt timeout. You can increase the timeout to a more reasonable value if your network is slow.
import { setDefaultAutoSelectFamilyAttemptTimeout } from 'net';
setDefaultAutoSelectFamilyAttemptTimeout(1000)
cc @ShogunPanda
I see you added the command-line option --network-family-autoselection-attempt-timeout but it is actually not taking effect while I am testing.
Only the setDefaultAutoSelectFamilyAttemptTimeout does affect the timeout.
The problem related to the auto select family attempt timeout. You can increase the timeout to a more reasonable value if your network is slow.
I'm not seeing network timeouts. I always see the requests complete immediately (well at least < 300ms, I haven't properly measured it).
My problem is the process is not terminating (apprently because of a lingering/leaking TCPSocketWrap object because that's what my CI reported, but I haven't yet built a testcase that would also pinpoint it)
If you're seeing network timeouts that's a pity, because then you're not reproducing the issue I'm seeing. I see you're in Hong Kong and all the testserves I have easy acces to are in Western Europe, so latency may be a factor.
If noone else chimes in to confirm or deny being able to reproduce the issue, I'll try to set up the files elsewhere
My problem is the process is not terminating (apprently because of a lingering/leaking TCPSocketWrap object because that's what my CI reported, but I haven't yet built a testcase that would also pinpoint it)
~~Yes, it is not terminating with the built-in one.
I am checking along with the timeout problem and using the undici one.
If you are using the undici one, it is actually exit properly with keep-alive timeout is over.
I assume the problem is fixed somehow in the upstream and will eventually be landed in v20 release line.~~
Said too early, it will close in both case but need to wait for a bit long if using the built-in.
Built-in wait until the keep-alive timeout 60 seconds, undici exit early when no more connection after the keep-alive checking threshold.
I assume the problem is fixed somehow in the upstream and will eventually be landed in v20 release line
Built-in ends in 65839.6259ms
Undici ends in 8334.4074ms
keep-alive might indeed be part of the issue, but that doesn't directly explain why the downloaded file size (or more likely: timing differences caused by file size) changes the behavior. The server doesn't change its keep-alive behavior on the size so I'd expect the process to either always terminate or always hang, but not depend on the downloaded size.
Thanks for the pointer. If I go straight for undici: { const fetch = require("undici") }
npm i [email protected]
node test.js http://cms6-sites.webhare.com/test-arnold/empty-40kb.docx # works
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx. # works
npm i [email protected]
node test.js http://cms6-sites.webhare.com/test-arnold/empty-40kb.docx # FAILS: process hangs
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx. # works
the change: https://github.com/nodejs/undici/security/advisories/GHSA-9f24-jqhm-jfcw
aha. backpressure is probably the keyword here, as that would be related to file sizes. The workaround is to consume the incoming body, which I indeed don't, but that being the workaround would suggest that I don't need to do that in 6.6.1.
if I consume the body I don't see the timeout anymore, but I didn't care about the body. As undici 6.16.0 still show the same behavior, I'd probably better file a bug there first if this behavior is indeed intentional.
thanks for your help and pointing me in the right direction
PS: the process does terminate after 8 seconds, but for my use case that's an eternity :-)
I posted the issue to undici here - https://github.com/nodejs/undici/issues/3230 - but without them changing anything, it's now fixed in node 22 (And still broken in 20 and 21)
I've run a bisect and as far as I can tell this commit to node 22 fixed the issue: https://github.com/nodejs/node/commit/1abff073921bcb0631602032aef0135bccfaee0d