ssh2-streams icon indicating copy to clipboard operation
ssh2-streams copied to clipboard

Implement a faster sftp createReadStream

Open mmis1000 opened this issue 8 years ago • 3 comments

Since the fastGet use multi connection to speed up the file download.
It would be great if we can use the same technique to speed up the stream from createReadStream. My initial implement to this is https://github.com/mmis1000/ssh-gateway/blob/c4ff38dc464027b6c4fd09cb91d3294dbeceb1f6/lib/ssh_fast_read_stream.js .
In my test, with 16 parallel download and 64 kb chunk size. It could achieve 30mbps download speed (about 10x faster then original readstream)

mmis1000 avatar Jan 19 '18 13:01 mmis1000

fastGet() doesn't use multiple connections, it only uses parallel reads within the same sftp connection. The reason for no parallel reads in the stream implementations is to mirror the behavior of fs.createReadStream(). Also, doing parallel reads when you have to watch the stream's highWaterMark makes things a little more complicated (currently your implementation is ignoring the size passed to _read() which could end up storing more data in memory than needed).

mscdex avatar Jan 19 '18 14:01 mscdex

Actually, I intentionally ignore the size of _read and keep the buffer size to a constant to ensure we always has enough data to pipe into the destination right away.
Fetch data only after destination is flushed will slow down the connection in high latency environment.
Waste a little bandwidth and memory to achieve highest speed makes sense in this case.
Otherwise, if we need to use less memory while keep highest speed at the same time, we will need to implement something like dynamic tcp window, but that looks too complicate.

mmis1000 avatar Jan 19 '18 17:01 mmis1000

I honestly don't see something like this being added anytime soon, for the reasons I mentioned. It'd probably be best left to another module building on top of the sftp API.

mscdex avatar Jan 19 '18 18:01 mscdex