proxy-chain icon indicating copy to clipboard operation
proxy-chain copied to clipboard

Forward mode blocks indefinitely upon certain content-disposition header value

Open starrify opened this issue 1 year ago • 2 comments

Overview

proxy-chain is observed to block indefinitely when certain conditions are met:

  1. It works in the forward mode (plain HTTP).
  2. There is content-disposition header that contains certain non-ASCII content.
  3. There is content-length header.

The root cause of issue is believed to originate from Node.js. Before an upstream fix is made (which might take time) proxy-chain here could apply some mitigation.

Upstream Issue: ServerResponse.writeHead may throw an error

There are already several upstream issue reports, one of which is https://github.com/nodejs/node/issues/50213

Mostly relevant is this section of code https://github.com/nodejs/node/blob/v20.16.0/lib/_http_outgoing.js#L554-L567 where the improper type change later involves implicit type conversion and unexpected charset decoding at https://github.com/nodejs/node/blob/v20.16.0/lib/_http_common.js#L216-L225

A minimal example for the decoding error: /[^\t\x20-\x7e\x80-\xff]/v.test(Buffer.from('à', 'latin1')) This then result in an invalid character error to be thrown (code) during ServerResponse.writeHead.

Impact to proxy-chain

As of the current revision, said issue would cause an error thrown from this call: https://github.com/apify/proxy-chain/blob/a1ac3f287e895270f4e34306ac754a62a9848fa4/src/forward.ts#L97-L101

It is then silently ignore here therefore the observed blocking: https://github.com/apify/proxy-chain/blob/a1ac3f287e895270f4e34306ac754a62a9848fa4/src/forward.ts#L110-L113

In earlier versions of proxy-chain (that is used when I noticed this issue), said error is not caught thus exposed. This leads to the error as reported here: https://github.com/apify/proxy-chain/issues/20

Sample Steps to Reproduce the Issue

Target web server:

$ socat -v -v TCP-LISTEN:8080,fork,reuseaddr,crlf SYSTEM:'echo HTTP/1.1 200 OK; echo "content-length: 6"; echo -e "content-disposition: \\\\xe0"; echo; echo -n "foobar"'

proxy-chain instance (just the minimal example from README.md):

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({ port: 8000 });

server.listen(() => {
    console.log(`Proxy server is listening on port ${8000}`);
});

Sending the request:

$ curl -v -x localhost:8000 'http://localhost:8080'

Expected result: The request shall finish. (see also: curl -v 'http://localhost:8080') Observed result: The request blocks indefinitely. (checked with Node.js v20.16.0 which is the active LTS)

Proposed Actions

I'm afraid that it's unlikely that the upstream will make a fix soon, considering how long their tickets have been around (e.g. https://github.com/nodejs/node/issues/50213).

There are a few proposals on what could be done here:

  1. To manipulate the content-disposition header ahead, so that the call of ServerResponse.writeHead no longer fails.
  2. Or, alter the response handler upon forwarding, so that the upstream error no longer blocks the request. Perhaps letting it fail is (slightly?) better than blocking.
  3. Also, maybe expose the error message to users so that they may have an idea on what's going on.

starrify avatar Aug 16 '24 12:08 starrify

Hello @starrify, thank you for the report and thorough description. I'm thinking what would be the best course of action here and I'm wondering: For the points you've mentioned in "Proposed Actions", what do you mean by "manipulate the content-disposition header"?

In my opinion, the second solution sounds good, letting the handler fail is much better than blocking the request. Sadly, we don't have that much capacity to work on this right now, would you be interested in submitting a pull request? I'd review it and release it afterwards.

Cheers

jirimoravcik avatar Aug 21 '24 09:08 jirimoravcik

mean by "manipulate the content-disposition header"

Hi and sorry for not being clear on that. One example may be to first perform certain transcoding (e.g. encodeURI) on that header value so as to ensure later it won't fail the Node.js code.

Since Node.js's behavior indeed messes up the header value via its superfluous (and incorrect) trasncoding, doing encodeURI in proxy-chain would indeed help retain the original header value.

BTW in case it might help, I am in the project from my side taking a kind-of-dirty-yet-quick approach by patching the Node.js executable to disable the behavior introduced in https://github.com/nodejs/node/commit/3fd43431297a8d6bba517ccf18f2fd2eb6e2f06e :

$  sed -i 's/isContentDispositionField(key) && self._contentLength/false \&\&                                        false/' "$NODE_EXEC_PATH"

(disclaimer: the patched Node.js executable only runs proxy-chain so that the impact may be minimized)

starrify avatar Aug 21 '24 21:08 starrify