Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket leak #615

Open
mpup371 opened this issue Feb 6, 2025 · 3 comments
Open

Socket leak #615

mpup371 opened this issue Feb 6, 2025 · 3 comments

Comments

@mpup371
Copy link

mpup371 commented Feb 6, 2025

Hello,
I have an application that upload files to a remote server using sftp. I upload files with io.Copy().
Last night the remote server had his file system full, causing io.Copy return SSH_FX_FAILURE error.

At each error we retry the upload with the same connection considering the connection is ok. But at the end, after a while, the process fall with "too many open files". Using lsof on the process I found thousands of opened socket on the remote server. (reproducible)

It seems that when there is an error on a connection, the socket remains opened and the library open a new socket, causing leaking.
Can you confirm that ? is it "normal" behavior ?
thanks for you answer.
jf

@puellanivis
Copy link
Collaborator

I’m going to need more details. Like, the function where the io.Copy() is located (or at least a generic model there of), and a listing of these open sockets.

You see, there’s no reconnection logic in our library. We only generate a new socket in NewClient and that socket stays alive for the lifetime of all requests. Is it possible that you’re seeing thousands of open files but not sockets? This is something I could see as possible, if the file handles are not being closed in the function calling io.Copy or, are not being closed before a retry is attempted. (This is why I need the model of the function calling io.Copy.)

Perhaps, also you might be falling into a known gotcha where there is some code like:

for {
  f, _ := sftpClient.OpenFile(…)
  defer f.Close()

  _, err := io.Copy()
  if err == nil {
    break or return
  }
}

In the above code, the defer is scheduled for the end of the whole function containing the for loop, and not as an “on continue” operation in the for loop, done before starting the next loop. In this specific case, a retry loop would accumulate open files from the OpenFile without closing any of them until the function were to end. With a quick enough retry, this would exhaust the available handles for the server, and present the behavior you’re pointing at, except it would be open files, not open sockets.

@mpup371
Copy link
Author

mpup371 commented Feb 7, 2025

Hello,
thank you for your quick answer.
Since the problem happens in a production environment, I cannot reproduce it easily, but I can confirm the problem was open sockets, not files.
However I suppose when I do
conn, err := ssh.Dial("tcp", host.Url.Host, &config)
cx, err = sftp.NewClient(conn, ...)
dstFile, err = cx.OpenFile(ftmp, (os.O_WRONLY | os.O_CREATE | os.O_TRUNC))
on the remote connection, the remote file descriptor create a socket ?

I will review my code and the loop more deeply with your advices, and come back to you if I can give more details.
regards
jf

@puellanivis
Copy link
Collaborator

If you are indeed performing:

conn, err := ssh.Dial("tcp", host.Url.Host, &config)
cx, err = sftp.NewClient(conn, ...)
dstFile, err = cx.OpenFile(ftmp, (os.O_WRONLY | os.O_CREATE | os.O_TRUNC))

Per upload, then you are indeed not reusing connections, and the issue would also indeed be too many sockets open. (More precisely, the ssh.Dial is opening the sockets.) If this is also being done in a for loop similar to my for { client.OpenFile… } example, then you would not be closing out the dial sockets properly, because the defer would still only execute at the end of the function, not during the loop.

You should not need to be doing a new ssh.Dial on every upload. They are relatively long-lived and can be reused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants