sftp icon indicating copy to clipboard operation
sftp copied to clipboard

Socket leak

Open mpup371 opened this issue 1 year ago • 3 comments

Hello, I have an application that upload files to a remote server using sftp. I upload files with io.Copy(). Last night the remote server had his file system full, causing io.Copy return SSH_FX_FAILURE error.

At each error we retry the upload with the same connection considering the connection is ok. But at the end, after a while, the process fall with "too many open files". Using lsof on the process I found thousands of opened socket on the remote server. (reproducible)

It seems that when there is an error on a connection, the socket remains opened and the library open a new socket, causing leaking. Can you confirm that ? is it "normal" behavior ? thanks for you answer. jf

mpup371 avatar Feb 06 '25 16:02 mpup371

I’m going to need more details. Like, the function where the io.Copy() is located (or at least a generic model there of), and a listing of these open sockets.

You see, there’s no reconnection logic in our library. We only generate a new socket in NewClient and that socket stays alive for the lifetime of all requests. Is it possible that you’re seeing thousands of open files but not sockets? This is something I could see as possible, if the file handles are not being closed in the function calling io.Copy or, are not being closed before a retry is attempted. (This is why I need the model of the function calling io.Copy.)

Perhaps, also you might be falling into a known gotcha where there is some code like:

for {
  f, _ := sftpClient.OpenFile(…)
  defer f.Close()

  _, err := io.Copy()
  if err == nil {
    break or return
  }
}

In the above code, the defer is scheduled for the end of the whole function containing the for loop, and not as an “on continue” operation in the for loop, done before starting the next loop. In this specific case, a retry loop would accumulate open files from the OpenFile without closing any of them until the function were to end. With a quick enough retry, this would exhaust the available handles for the server, and present the behavior you’re pointing at, except it would be open files, not open sockets.

puellanivis avatar Feb 06 '25 19:02 puellanivis

Hello, thank you for your quick answer. Since the problem happens in a production environment, I cannot reproduce it easily, but I can confirm the problem was open sockets, not files. However I suppose when I do conn, err := ssh.Dial("tcp", host.Url.Host, &config) cx, err = sftp.NewClient(conn, ...) dstFile, err = cx.OpenFile(ftmp, (os.O_WRONLY | os.O_CREATE | os.O_TRUNC)) on the remote connection, the remote file descriptor create a socket ?

I will review my code and the loop more deeply with your advices, and come back to you if I can give more details. regards jf

mpup371 avatar Feb 07 '25 09:02 mpup371

If you are indeed performing:

conn, err := ssh.Dial("tcp", host.Url.Host, &config)
cx, err = sftp.NewClient(conn, ...)
dstFile, err = cx.OpenFile(ftmp, (os.O_WRONLY | os.O_CREATE | os.O_TRUNC))

Per upload, then you are indeed not reusing connections, and the issue would also indeed be too many sockets open. (More precisely, the ssh.Dial is opening the sockets.) If this is also being done in a for loop similar to my for { client.OpenFile… } example, then you would not be closing out the dial sockets properly, because the defer would still only execute at the end of the function, not during the loop.

You should not need to be doing a new ssh.Dial on every upload. They are relatively long-lived and can be reused.

puellanivis avatar Feb 07 '25 11:02 puellanivis