Open
Description
When an established connection is breaking for some reason (because the TCP Ack does not come within the TCP User Timeout, for example) re-establishing the connection is bound by the connection timeout - it just cannot happen faster. Consider the following scenario:
- The connection is breaking at A side and A.1 sends the connect request to B.
- B.2 receives the request, but the remote endpoint is still valid there, so it replies with the ConnectionRequestCrossed (code) and tries to probe the existing connection.
- The probing fails and B.1 sends its connection request to A.
- A.1 gets ConnectionRequestCrossed reply and gets stuck on the "resolved" MVar at findRemoteEndPoint until the connection timeout fires.
- A.2 receives the connection request and replies with ConnectionRequestCrossed.
- B.3 receives the ConnectionRequestCrossed reply and puts the "crossed" MVar.
- The connection timeout fires on A.1, it releases the "resolved" MVar and removes the stale endpoint. The new connection request is sent to B.
- B.2 immediately accepts the request (because the "crossed" MVar was already set).
Here is the log of the described scenario:
ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=100951
ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=87242 sent=65160
ssu1: May 07 11:27:06 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=44503 sent=26171
ssu1: May 07 11:27:06 ssu1 p[6344]: runScheduledAction: exception=ProcessLinkException pid://172.28.128.18:9070:0:485043 DiedNormal remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.Internal.mapIOException (src/Network/Transport/Internal.hs:149:1-42)","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=72780
ssu1: May 07 11:27:07 ssu1 p[6344]: handleIncomingMessages.prematureExit: err=user error (recvExact: Socket closed) st=RemoteEndPointValid
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: resource vanished (Broken pipe) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: user error (recvExact: Socket closed)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: timeout (Connection timed out) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: resource vanished (Broken pipe)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: invalid argument (Bad file descriptor) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: timeout (Connection timed out)
ssu1: May 07 11:27:07 ssu1 p[6344]: resetIfBroken: remoteEP failed
ssu1: May 07 11:27:07 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,18)
ssu1: May 07 11:27:07 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,2)
ssu5: May 07 11:27:08 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.18:60992
ssu5: May 07 11:27:08 ssu5 p[6375]: ProbeSocket: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: handleIncomingMessages.prematureExit: err=Network.Socket.recvBuf: invalid argument (Bad file descriptor) st=RemoteEndPointValid
ssu5: May 07 11:27:08 ssu5 p[6375]: resetIfBroken: remoteEP failed
ssu5: May 07 11:27:08 ssu5 p[6375]: removeRemoteEndPoint: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:08 ssu5 p[6375]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.18:9070, addrCanonName = Nothing}
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,18)
ssu1: May 07 11:27:08 ssu1 p[6344]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.19:36454
-------------------- connectTimeout is fired !!!
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: removeRemoteEndPoint 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1024
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1025
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1026
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1027
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=82536
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=43198 sent=10136
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestAccepted: 172.28.128.18:60994
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1024
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1025
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: Network.Socket.ByteString.sendMany: total=125661 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=153488
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=157832
ssu1: May 07 11:27:14 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=273038 sent=218648
In this case, the connection timeout (connectTimeout) was 2 seconds.
Is this a bug or it's how it is supposed to work?
Originally posted by @andriytk in haskell-distributed/network-transport-tcp#74 (comment)