Description
Describe the bug
For local use, I'm trying to configure the Swift gRPC client's connection backoff so that it:
- Retries quickly with no jitter or exponential backoff
- Gives up and fails the RPC after a short timeout (e.g. 10 sec)
The issue is that once ConnectionBackoffIterator reaches the retry limit, the pool never schedules another connection attempt — all future RPCs fail after maxWaitTime. So effectively, once there has been a connection failure, the GRPCChannelPool (or ClientConnection) becomes unusable. It also doesn't propagate the connection error until maxWaitTime is reached, even if retries are exhausted before that.
RPCs fail with this error:
deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
deadline exceeded (4): Timed out waiting for an HTTP/2 stream from the connection pool, cause: unavailable (14): Unexpected connection drop
Allowing unlimited retries fixes the issue of the pool becoming unusable, but that's not a viable solution because the pool doesn't seem to stop retrying even after all RPCs have timed out. The server could be stopped and may not be started again for hours or days, and my quick-retry requirement means that the app will be wasting a lot of CPU trying to connect in the background.
Some RPCs could take a very long time when there are no connection issues, so I can't set a short timeLimit on CallOptions.
I wouldn't mind doing the retry logic myself (even if temporarily) if the gRPC behavior won't work for my needs, but the same occurs with ClientConnection and connectionBackoff = nil
, and GRPCChannelPool doesn't seem to support completely bypassing the backoff logic.
grpc-swift version: 1.14.2
Debug build
Platform: macOS 13
To reproduce
GRPCChannelPool.with(...) {
$0.connectionPool = GRPCChannelPool.Configuration.ConnectionPool.with { pool in
pool.maxWaitTime = .seconds(5)
}
$0.connectionBackoff = ConnectionBackoff(
initialBackoff: 0.1,
maximumBackoff: 0.25,
multiplier: 1,
jitter: 0,
//minimumConnectionTimeout: 0.1,
retries: .upTo(50)
)
}
Or with the old ClientConnection API:
var config = ClientConnection.Configuration.default(...)
config.connectionBackoff = ConnectionBackoff(
initialBackoff: 0.1,
maximumBackoff: 0.25,
multiplier: 1,
jitter: 0,
//minimumConnectionTimeout: 0.1,
retries: .upTo(50)
)
config.callStartBehavior = .waitsForConnectivity
return ClientConnection(configuration: config)
I've tried various combinations of retries
, minimumConnectionTimeout
, maxWaitTime
, and callStartBehavior
to no avail.
Expected behaviour
Describe what you expect to happen when the steps above are followed.
Additional information
Logs with retry limit = 1:
2023-04-11T12:34:07-0700 debug grpc : poolmanager.id=ObjectIdentifier(0x0000600002dc8480) poolmanager.pools.conns_per_pool=1 poolmanager.pools.count=1 poolmanager.pools.waiters_per_pool=100 [GRPC] initializing connection pool manager
2023-04-11 12:34:07.806 OrbStack[2862:19592931] start rpc
2023-04-11T12:34:07-0700 debug grpc call opt : grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A pool.waiter.id=ObjectIdentifier(0x0000600001bb1fe0) pool.waiters.count=0 [GRPC] waiting for a connection to become available
2023-04-11T12:34:07-0700 trace grpc : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:07-0700 debug grpc : pool.reservations.capacity=0 pool.reservations.count=1 pool.reservations.load=inf pool.reservations.loadThreshold=0.9 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] stream reservation load factor greater than or equal to threshold, bringing up additional connection if available
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=message [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=end [GRPC] buffering request part
2023-04-11T12:34:07-0700 debug grpc call opt : grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B pool.waiter.id=ObjectIdentifier(0x0000600001bb1db0) pool.waiters.count=1 [GRPC] waiting for a connection to become available
2023-04-11T12:34:07-0700 trace grpc : pool.waiters.count=2 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=message [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=end [GRPC] buffering request part
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] creating a ClientBootstrap
2023-04-11T12:34:07-0700 debug grpc : connectivity_state=connecting grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] activating connection
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=active grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] deactivating connection
2023-04-11T12:34:08-0700 debug grpc : delay_secs=0.1 grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] scheduling connection attempt
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] creating a ClientBootstrap
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=connecting grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] activating connection
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=active grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] deactivating connection
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] shutting down connection
2023-04-11 12:34:12.968 OrbStack[2862:19592931] RPC returned error: deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
2023-04-11 12:34:12.968 OrbStack[2862:19592931] RPC returned error: deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
2023-04-11T12:34:12-0700 trace grpc call opt : call_state=closing grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A [GRPC] failing buffered writes
2023-04-11T12:34:12-0700 trace grpc call opt : grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A pool.waiter.id=ObjectIdentifier(0x0000600001bb1fe0) pool.waiters.count=1 [GRPC] timed out waiting for a connection
2023-04-11T12:34:12-0700 trace grpc call opt : call_state=closing grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B [GRPC] failing buffered writes
2023-04-11T12:34:12-0700 trace grpc call opt : grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B pool.waiter.id=ObjectIdentifier(0x0000600001bb1db0) pool.waiters.count=0 [GRPC] timed out waiting for a connection
2023-04-11T12:34:13-0700 debug grpc call opt : grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 pool.waiter.id=ObjectIdentifier(0x0000600001b994f0) pool.waiters.count=0 [GRPC] waiting for a connection to become available
2023-04-11T12:34:13-0700 trace grpc : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:13-0700 debug grpc call opt : grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 pool.waiter.id=ObjectIdentifier(0x0000600001b99450) pool.waiters.count=1 [GRPC] waiting for a connection to become available
2023-04-11T12:34:13-0700 trace grpc : pool.waiters.count=2 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=message [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=end [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=message [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=end [GRPC] buffering request part