Forward mode blocks indefinitely upon certain content-disposition header value
Overview
proxy-chain is observed to block indefinitely when certain conditions are met:
- It works in the forward mode (plain HTTP).
- There is
content-dispositionheader that contains certain non-ASCII content. - There is
content-lengthheader.
The root cause of issue is believed to originate from Node.js. Before an upstream fix is made (which might take time) proxy-chain here could apply some mitigation.
Upstream Issue: ServerResponse.writeHead may throw an error
There are already several upstream issue reports, one of which is https://github.com/nodejs/node/issues/50213
Mostly relevant is this section of code https://github.com/nodejs/node/blob/v20.16.0/lib/_http_outgoing.js#L554-L567 where the improper type change later involves implicit type conversion and unexpected charset decoding at https://github.com/nodejs/node/blob/v20.16.0/lib/_http_common.js#L216-L225
A minimal example for the decoding error: /[^\t\x20-\x7e\x80-\xff]/v.test(Buffer.from('à', 'latin1')) This then result in an invalid character error to be thrown (code) during ServerResponse.writeHead.
Impact to proxy-chain
As of the current revision, said issue would cause an error thrown from this call: https://github.com/apify/proxy-chain/blob/a1ac3f287e895270f4e34306ac754a62a9848fa4/src/forward.ts#L97-L101
It is then silently ignore here therefore the observed blocking: https://github.com/apify/proxy-chain/blob/a1ac3f287e895270f4e34306ac754a62a9848fa4/src/forward.ts#L110-L113
In earlier versions of proxy-chain (that is used when I noticed this issue), said error is not caught thus exposed. This leads to the error as reported here: https://github.com/apify/proxy-chain/issues/20
Sample Steps to Reproduce the Issue
Target web server:
$ socat -v -v TCP-LISTEN:8080,fork,reuseaddr,crlf SYSTEM:'echo HTTP/1.1 200 OK; echo "content-length: 6"; echo -e "content-disposition: \\\\xe0"; echo; echo -n "foobar"'
proxy-chain instance (just the minimal example from README.md):
const ProxyChain = require('proxy-chain');
const server = new ProxyChain.Server({ port: 8000 });
server.listen(() => {
console.log(`Proxy server is listening on port ${8000}`);
});
Sending the request:
$ curl -v -x localhost:8000 'http://localhost:8080'
Expected result: The request shall finish. (see also: curl -v 'http://localhost:8080')
Observed result: The request blocks indefinitely.
(checked with Node.js v20.16.0 which is the active LTS)
Proposed Actions
I'm afraid that it's unlikely that the upstream will make a fix soon, considering how long their tickets have been around (e.g. https://github.com/nodejs/node/issues/50213).
There are a few proposals on what could be done here:
- To manipulate the
content-dispositionheader ahead, so that the call ofServerResponse.writeHeadno longer fails. - Or, alter the response handler upon forwarding, so that the upstream error no longer blocks the request. Perhaps letting it fail is (slightly?) better than blocking.
- Also, maybe expose the error message to users so that they may have an idea on what's going on.
Hello @starrify,
thank you for the report and thorough description.
I'm thinking what would be the best course of action here and I'm wondering:
For the points you've mentioned in "Proposed Actions", what do you mean by "manipulate the content-disposition header"?
In my opinion, the second solution sounds good, letting the handler fail is much better than blocking the request. Sadly, we don't have that much capacity to work on this right now, would you be interested in submitting a pull request? I'd review it and release it afterwards.
Cheers
mean by "manipulate the content-disposition header"
Hi and sorry for not being clear on that. One example may be to first perform certain transcoding (e.g. encodeURI) on that header value so as to ensure later it won't fail the Node.js code.
Since Node.js's behavior indeed messes up the header value via its superfluous (and incorrect) trasncoding, doing encodeURI in proxy-chain would indeed help retain the original header value.
BTW in case it might help, I am in the project from my side taking a kind-of-dirty-yet-quick approach by patching the Node.js executable to disable the behavior introduced in https://github.com/nodejs/node/commit/3fd43431297a8d6bba517ccf18f2fd2eb6e2f06e :
$ sed -i 's/isContentDispositionField(key) && self._contentLength/false \&\& false/' "$NODE_EXEC_PATH"
(disclaimer: the patched Node.js executable only runs proxy-chain so that the impact may be minimized)