-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make client and server to resync active connections #74
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to follow what we discussed during our meeting.
Given the potential deadlock mentioned in the comments below, I think this requires more testing. It should be tested in a custom Rancher build and it would be beneficial imo to have integration tests for this, though I understand that right now remotedialer doesn't have this in place.
e8d07f6
to
1bdf8ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mostly nits at this point, since I am not yet familiar with this project. I may have to come back and think about the solution itself once I learn a bit more about remotedialer.
session.go
Outdated
if PrintTunnelData { | ||
defer logrus.Debugf("CONNECTIONS %d %d", s.sessionKey, len(s.conns)) | ||
} | ||
return conn | ||
} | ||
|
||
// lockedRemoveConnection removes a given connection from the session. The session lock must be held by the caller when calling this method | ||
func (s *Session) lockedRemoveConnection(connID int64) *connection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's strange that an unexported method's has some context in its name - that it must be called when the session lock is held. This function is sort of unnecessarily "colored". If this were an exported function, I'd understand. But it's odd that there is no locking in this method but instead this directive in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a common practice in Go codebases, which purpose is to warn callers that it's their responsibility to held a lock in order to access the protected resource. Many examples can be found in the Kubernetes codebase or the Go standard library itself:
sync.Map
- https://github.com/search?q=repo%3Akubernetes%2Fkubernetes%20locked&type=code
- https://github.com/search?q=repo%3Agolang%2Fgo%20locked&type=code
Nonetheless, I just realized I didn't apply this convention correctly, since I'm using locked
as a prefix instead of as a suffix, so I'll change that (ee938dc).
session_sync.go
Outdated
|
||
// diffSortedSetsGetRemoved compares two sorted slices and returns those items present in a that are not present in b | ||
// similar to coreutil's "comm -23" | ||
func diffSortedSetsGetRemoved(a, b []int64) (res []int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this function's name is not intuitive. Also, named function return values tend to be avoided, though golangci permits them when there is no configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but this bikeshed has already been repainted too many times already in this PR, from using a simpler for loop to this partial lists diff algorithm, and also renames (previously named removedFromSlice
but it was not clear enough).
Please let me know if you have a better suggestion, otherwise I will stick to this name + the explanatory comment in order to provide enough context to the reader.
I will remove the named return value though, since it does not provide any value here, thanks.
session_sync.go
Outdated
|
||
// closeMissingConnections closes any session connection that is not present in the IDs received from the client | ||
// The session lock must be hold by the caller when calling this method | ||
func (s *Session) closeStaleConnections(clientIDs []int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name and signature of this imply that the method receives a list of IDs of connections to close. But there is filtering going on. Perhaps this method should be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice point, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 14c3d27
@@ -75,3 +76,76 @@ func TestSession_sessionKeys(t *testing.T) { | |||
t.Errorf("incorrect number of remote client keys after removal, got: %d, want %d", got, want) | |||
} | |||
} | |||
|
|||
func TestSession_activeConnectionIDs(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider running this test with its subtests in parallel, since the method under test is locked but available to multiple readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Please note that, since the session is not shared between tests cases, they will all have their own lock though
Issue: 44576
Relates to #68
Depends on #78
Problem
remotedialer allows multiplexing connections between two peers using a single websocket connections by including a connection ID in the messages and using separate buffers. The protocol specifies different message types for different actions (
Connect
,Data
,Error
,Pause
,Resume
, etc.). In particular, theError
type is used to communicate the other end that a certain connection must be closed. However, depending on the cause of the original error, this message may never be successfully transmitted, as the sender will give up on sending it (#67 adds additional logging for this situation).When this happens, one of the peers will never receive a termination message for that connection, making the underlying buffers to get stuck on
Read()
forever, hence causing goroutine and memory leaks.Solution
This PR adds a new message type to the protocol (
Resync
), whose payload contains a list of connection IDs. Similarly to how clients sendsPing
control messages,Resync
messages will periodically tell the receiving peer that any connection not contained in the provided list is no longer needed and can be pruned.Small caveat: we cannot use Control messages for this purpose, since websocket set a limit of 125 bytes for their payload, which would impose a tight restriction on the number of connections.
CheckList