SftpClient: Hang after ~500 MB with Dropbear server
After downloading about 500 MB of data, I'm getting a hang in DownloadFile.
I tested with files of various sizes: with a 1 MB (1,048,576 bytes) test file, 475 downloads work, and it deadlocks on the 476th one. With 2 MB files, 237 work, and it locks on the 238th. In all cases, it deadlocks shortly before 499,000,000 bytes.
This only seems to be an issue with the Dropbear SSH server with OpenSSH's sftp. When using OpenSSH on a desktop Linux, this issue does not occur.
I am assuming that this is not (strictly speaking) a bug in SSH.Net. Since it works with OpenSSH, I'm assuming it's Dropbear not acting quite right. But supporting slightly misbehaving SSH servers is a reasonable thing for SSH.Net to do.
Here is my test program.
using Renci.SshNet;
using System.Diagnostics;
namespace SshTransferTest;
internal class Program
{
// This is my small embedded device. Low power. Running PetaLinux on a Xilinx SoC.
// SSH Server is Dropbear v2020.80. (No, I can't change that.)
// (Why 127.0.0.1? I'm on a work computer, with a strict VPN. I have to use an SSH tunnel
// to make connections to machines on the local network.)
const string host = "127.0.0.1";
const int port = 2122;
const string user = "root";
const string privKey = "DeviceKey.id_rsa";
const string largeFile = "/mnt/data/testLargeFile"; // 1.0 MB of /dev/urandom
static void Main(string[] args)
{
Stopwatch sw = new();
void callback(object? _)
{
Console.WriteLine("Running for {0}", sw.Elapsed);
if (sw.Elapsed > TimeSpan.FromSeconds(30))
Debugger.Break();
}
Timer lockupTimer = new(callback);
SftpClient sftpClient = new(
host, port, user,
new PrivateKeyFile(typeof(Program).Assembly.GetManifestResourceStream(typeof(Program), privKey)!));
sftpClient.Connect();
for (int i = 0; i < 1000000; i++)
{
Console.WriteLine("{0}: Starting Download", i);
lockupTimer.Change(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(5));
sw.Restart();
using (MemoryStream ms = new())
{
sftpClient.DownloadFile(largeFile, ms);
}
sw.Stop();
lockupTimer.Change(Timeout.InfiniteTimeSpan, Timeout.InfiniteTimeSpan);
Console.WriteLine("{0}: Complete in {1}", i, sw.Elapsed);
// I tried having a sleep here, but it didn't seem to make any difference.
//Thread.Sleep(1000);
}
}
}
When it deadlocks, here's what the call stack looks like:
- Renci.SshNet.dll!Renci.SshNet.Sftp.SftpFileReader.Read() Line 106
- Renci.SshNet.dll!Renci.SshNet.SftpClient.InternalDownloadFile(string path, System.IO.Stream output, Renci.SshNet.Sftp.SftpDownloadAsyncResult asyncResult, System.Action
downloadCallback) Line 1371 - Renci.SshNet.dll!Renci.SshNet.SftpClient.DownloadFile(string path, System.IO.Stream output, System.Action
downloadCallback) Line 801 - SshTransferTest.dll!SshTransferTest.Program.Main(string[] args) Line 62
Here's the exact line where it's deadlocks in SftpFileReader.Read():
public byte[] Read()
{
ThrowHelper.ThrowObjectDisposedIf(_disposingOrDisposed, this);
if (_exception != null)
{
ExceptionDispatchInfo.Capture(_exception).Throw();
}
if (_isEndOfFileRead)
{
throw new SshException("Attempting to read beyond the end of the file.");
}
BufferedRead value;
lock (_readLock)
{
while (!_queue.TryGetValue(_nextChunkIndex, out value) && _exception == null)
{
Monitor.Wait(_readLock); //<-------------------
}
Other things I tried
Disconnect/Reconnect
for (int i = 0; i < 1000000; i++)
{
if (i % 200 == 0)
{
sftpClient.Disconnect();
sftpClient.Connect();
}
Console.WriteLine("{0}: Starting Download", i);
With this change, the deadlock does not occur. (Tested about 120,000 iterations.) So it appears to be a limit on the number of bytes transferred per SFTP connection, not per SftpClient object lifetime.
Different Download Method
//using (MemoryStream ms = new())
//{
// sftpClient.DownloadFile(largeFile, ms);
//}
using (var sftpStream = sftpClient.OpenRead(largeFile))
{
byte[] temp = new byte[65536];
long red;
while ((red = sftpStream.Read(temp)) != 0)
{
// Keep reading
lockupTimer.Change(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(5));
}
}
With this change, the issue is still present. It still deadlocks, and still on the 476th download. With a different array size, still a deadlock on the 476th download.
Here is what I think is happening:
SSH.NET opens a channel with a (very) large initial window size (2GB):
https://github.com/sshnet/SSH.NET/blob/03ae6bfa0f1bda7635d3a4eb8888bebf3fab1105/src/Renci.SshNet/Session.cs#L58
https://github.com/sshnet/SSH.NET/blob/03ae6bfa0f1bda7635d3a4eb8888bebf3fab1105/src/Renci.SshNet/Channels/ChannelSession.cs#L365-L367
dropbear clamps it to 500MB:
https://github.com/mkj/dropbear/blob/16106997d11615e5e2dfe477def062aed7ed0bca/src/sysoptions.h#L246
https://github.com/mkj/dropbear/blob/16106997d11615e5e2dfe477def062aed7ed0bca/src/common-channel.c#L909
SSH.NET only sends a window_adjust message after its window is practically depleted i.e. after 2GB of data received. LocalPacketSize is 64KB here:
https://github.com/sshnet/SSH.NET/blob/03ae6bfa0f1bda7635d3a4eb8888bebf3fab1105/src/Renci.SshNet/Channels/Channel.cs#L775-L780
So dropbear sends 500MB of data, assumes the window is depleted, and doesn't receive a message to adjust the window. So it does not send any more data.
So who is at fault? Well granted, the SSH.NET behaviour is questionable, but it appears to be in spec:
The 'initial window size' specifies how many bytes of channel data can be sent to the sender of this message without adjusting the window.
(https://datatracker.ietf.org/doc/html/rfc4254#section-5.1)
Implementations MUST correctly handle window sizes of up to 2^32 - 1 bytes.
(https://datatracker.ietf.org/doc/html/rfc4254#section-5.2)
I do agree though that the library should be less unusual here so that this issue would not be hit