Skip to content

Conversation

@javier-aliaga
Copy link
Contributor

@javier-aliaga javier-aliaga commented Oct 27, 2025

Description

This PR introduces a re-connection mechanism to the sftp binding.

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #4061

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • Extended the documentation
    • Created the dapr/docs PR:

Note: We expect contributors to open a corresponding documentation PR in the dapr/docs repository. As the implementer, you are the best person to document your work! Implementation PRs will not be merged until the documentation PR is opened and ready for review.

@javier-aliaga javier-aliaga force-pushed the sftp-connections branch 19 times, most recently from a331e1d to f79c833 Compare November 3, 2025 12:52
@javier-aliaga javier-aliaga marked this pull request as ready for review November 3, 2025 12:53
@javier-aliaga javier-aliaga requested review from a team as code owners November 3, 2025 12:53
@javier-aliaga javier-aliaga force-pushed the sftp-connections branch 2 times, most recently from d211516 to d43bfab Compare November 3, 2025 13:24
Comment on lines 186 to 187
c.rLock.Lock()
defer c.rLock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to follow the logic when we have 2 locks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment to explain why is each lock used, let me know what do you think

msg := strings.ToLower(err.Error())
switch {
case strings.Contains(msg, "use of closed network connection"),
strings.Contains(msg, "connection reset by peer"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use typed errors here? syscall.ECONNRESET

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add to the previous if(L230), but I will keep the message here just in case.

Copy link
Contributor Author

@javier-aliaga javier-aliaga Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the method, we will not reconnect if it is one of this errors:

if errors.Is(err, sftpClient.ErrSSHFxPermissionDenied) ||
		errors.Is(err, sftpClient.ErrSSHFxNoSuchFile) ||
		errors.Is(err, sftpClient.ErrSSHFxOpUnsupported) {
		return false
	}

Any other error we will try to reconnect

@javier-aliaga javier-aliaga force-pushed the sftp-connections branch 2 times, most recently from 69d8f53 to c3594d7 Compare November 12, 2025 10:44
Copy link
Contributor

@JoshVanL JoshVanL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still finding it hard to follow the logic here. We are not locking at the top of functions and we seem to be setting and unsetting need re-connection in multiple places.

@javier-aliaga javier-aliaga force-pushed the sftp-connections branch 5 times, most recently from 7353dde to a5af6f4 Compare November 18, 2025 12:23
if !c.needsReconnect.Load() {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move below lock

Comment on lines 206 to 207
pErr := c.ping()
if pErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pErr := c.ping()
if pErr != nil {
err := c.ping()
if err == nil {
return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the c.ping works? should we set needsReconnect to false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think so :)

Comment on lines 219 to 220
oldSftp := c.sftpClient
oldSSH := c.sshClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for these temp variables.

Suggested change
oldSftp := c.sftpClient
oldSSH := c.sshClient
if c.sftpClient != nil {
c.sftpClient.Close()
}
if c.sshClient != nil {
c.sshClient.Close()
}

Comment on lines 169 to 175
if c.shouldReconnect(err) {
c.log.Debugf("sftp binding error: %s", err)
c.needsReconnect.Store(true)
} else {
c.lock.RUnlock()
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c.shouldReconnect(err) {
c.log.Debugf("sftp binding error: %s", err)
c.needsReconnect.Store(true)
} else {
c.lock.RUnlock()
return err
}
if !c.shouldReconnect(err) {
c.lock.RUnlock()
return err
}
c.log.Debugf("sftp binding error: %s", err)
c.needsReconnect.Store(true)

}
c.lock.RUnlock()

if c.needsReconnect.Load() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c.needsReconnect.Load() {

c.lock.RUnlock()

if c.needsReconnect.Load() {
c.log.Debugf("sftp binding: reconnecting to %s", c.address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move log line inside doReconnect, and only after we have locked and checked nedsReconnect.

@javier-aliaga javier-aliaga force-pushed the sftp-connections branch 3 times, most recently from 78a33bd to cc88c05 Compare November 18, 2025 14:07
JoshVanL
JoshVanL previously approved these changes Nov 18, 2025
Copy link
Contributor

@cicoyle cicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few things from me. Thx!

}

func withReconnection(c *Client, fn func() error) error {
c.lock.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This locking then calling to reconnect then locking again feels like maybe we should have an abstraction function with a for loop where we only lock in that one place versus in a few places. Maybe it could be cleaner if we had a client.do func that had a for loop where we check the client, then do the shouldReconnect and reconnect calls - this way we try reconnecting more than once otherwise, as is we only try once I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to put more complex mechanism with retries and backoffs, I think this component it may need to be rewritten as some customers requested different improvements. I did a small refactor to introduce a do func where we lock the action 51c4e94

what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it can be rewritten and improved further. plz open an issue so we dont forget and it can be prioritized at a later date 👍🏻

Copy link
Contributor

@cicoyle cicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thx Javi!

@cicoyle cicoyle merged commit b4481d2 into dapr:main Nov 20, 2025
124 of 125 checks passed
@cicoyle cicoyle added this to the v1.17 milestone Nov 20, 2025
javier-aliaga added a commit to javier-aliaga/components-contrib that referenced this pull request Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SFTP Binding seems to overload SFTP server with connections

4 participants