Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make client and server to resync active connections #74

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

aruiz14
Copy link
Contributor

@aruiz14 aruiz14 commented Apr 5, 2024

Issue: 44576

Relates to #68
Depends on #78

Problem

remotedialer allows multiplexing connections between two peers using a single websocket connections by including a connection ID in the messages and using separate buffers. The protocol specifies different message types for different actions (Connect, Data, Error, Pause, Resume, etc.). In particular, the Error type is used to communicate the other end that a certain connection must be closed. However, depending on the cause of the original error, this message may never be successfully transmitted, as the sender will give up on sending it (#67 adds additional logging for this situation).

When this happens, one of the peers will never receive a termination message for that connection, making the underlying buffers to get stuck on Read() forever, hence causing goroutine and memory leaks.

Solution

This PR adds a new message type to the protocol (Resync), whose payload contains a list of connection IDs. Similarly to how clients sends Ping control messages, Resync messages will periodically tell the receiving peer that any connection not contained in the provided list is no longer needed and can be pruned.

Small caveat: we cannot use Control messages for this purpose, since websocket set a limit of 125 bytes for their payload, which would impose a tight restriction on the number of connections.

CheckList

  • Test

session_resync.go Outdated Show resolved Hide resolved
connection.go Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
message.go Outdated Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
connection.go Show resolved Hide resolved
moio
moio previously approved these changes Apr 9, 2024
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks

session_sync.go Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tomleb tomleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to follow what we discussed during our meeting.

Given the potential deadlock mentioned in the comments below, I think this requires more testing. It should be tested in a custom Rancher build and it would be beneficial imo to have integration tests for this, though I understand that right now remotedialer doesn't have this in place.

session_sync.go Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
message.go Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
moio
moio previously approved these changes Jun 13, 2024
aruiz14 added a commit to aruiz14/remotedialer that referenced this pull request Jun 13, 2024
Copy link

@maxsokolovsky maxsokolovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly nits at this point, since I am not yet familiar with this project. I may have to come back and think about the solution itself once I learn a bit more about remotedialer.

message.go Outdated Show resolved Hide resolved
session.go Outdated
if PrintTunnelData {
defer logrus.Debugf("CONNECTIONS %d %d", s.sessionKey, len(s.conns))
}
return conn
}

// lockedRemoveConnection removes a given connection from the session. The session lock must be held by the caller when calling this method
func (s *Session) lockedRemoveConnection(connID int64) *connection {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's strange that an unexported method's has some context in its name - that it must be called when the session lock is held. This function is sort of unnecessarily "colored". If this were an exported function, I'd understand. But it's odd that there is no locking in this method but instead this directive in the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common practice in Go codebases, which purpose is to warn callers that it's their responsibility to held a lock in order to access the protected resource. Many examples can be found in the Kubernetes codebase or the Go standard library itself:

Nonetheless, I just realized I didn't apply this convention correctly, since I'm using locked as a prefix instead of as a suffix, so I'll change that (ee938dc).

session_sync.go Outdated

// diffSortedSetsGetRemoved compares two sorted slices and returns those items present in a that are not present in b
// similar to coreutil's "comm -23"
func diffSortedSetsGetRemoved(a, b []int64) (res []int64) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this function's name is not intuitive. Also, named function return values tend to be avoided, though golangci permits them when there is no configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but this bikeshed has already been repainted too many times already in this PR, from using a simpler for loop to this partial lists diff algorithm, and also renames (previously named removedFromSlice but it was not clear enough).
Please let me know if you have a better suggestion, otherwise I will stick to this name + the explanatory comment in order to provide enough context to the reader.

I will remove the named return value though, since it does not provide any value here, thanks.

session_sync.go Outdated

// closeMissingConnections closes any session connection that is not present in the IDs received from the client
// The session lock must be hold by the caller when calling this method
func (s *Session) closeStaleConnections(clientIDs []int64) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and signature of this imply that the method receives a list of IDs of connections to close. But there is filtering going on. Perhaps this method should be renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice point, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 14c3d27

session_sync_test.go Show resolved Hide resolved
session_sync_test.go Show resolved Hide resolved
session_sync_test.go Show resolved Hide resolved
@@ -75,3 +76,76 @@ func TestSession_sessionKeys(t *testing.T) {
t.Errorf("incorrect number of remote client keys after removal, got: %d, want %d", got, want)
}
}

func TestSession_activeConnectionIDs(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider running this test with its subtests in parallel, since the method under test is locked but available to multiple readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Please note that, since the session is not shared between tests cases, they will all have their own lock though

session_test.go Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
session_test.go Outdated Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants