Connection backoff with limited retries hangs indefinitely

### Describe the bug

For local use, I'm trying to configure the Swift gRPC client's connection backoff so that it:

- Retries quickly with no jitter or exponential backoff
- Gives up and fails the RPC after a short timeout (e.g. 10 sec)

The issue is that once ConnectionBackoffIterator reaches the retry limit, the pool never schedules another connection attempt — all future RPCs fail after maxWaitTime. So effectively, once there has been a connection failure, the GRPCChannelPool (or ClientConnection) becomes unusable. It also doesn't propagate the connection error until maxWaitTime is reached, even if retries are exhausted before that.

RPCs fail with this error:

```
deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
deadline exceeded (4): Timed out waiting for an HTTP/2 stream from the connection pool, cause: unavailable (14): Unexpected connection drop
```

Allowing unlimited retries fixes the issue of the pool becoming unusable, but that's not a viable solution because the pool doesn't seem to stop retrying even after all RPCs have timed out. The server could be stopped and may not be started again for hours or days, and my quick-retry requirement means that the app will be wasting a lot of CPU trying to connect in the background.

Some RPCs could take a very long time when there are no connection issues, so I can't set a short timeLimit on CallOptions.

I wouldn't mind doing the retry logic myself (even if temporarily) if the gRPC behavior won't work for my needs, but the same occurs with ClientConnection and `connectionBackoff = nil`, and GRPCChannelPool doesn't seem to support completely bypassing the backoff logic.

grpc-swift version: 1.14.2
Debug build
Platform: macOS 13

### To reproduce

```swift
GRPCChannelPool.with(...) {
    $0.connectionPool = GRPCChannelPool.Configuration.ConnectionPool.with { pool in
        pool.maxWaitTime = .seconds(5)
    }
    $0.connectionBackoff = ConnectionBackoff(
        initialBackoff: 0.1,
        maximumBackoff: 0.25,
        multiplier: 1,
        jitter: 0,
        //minimumConnectionTimeout: 0.1,
        retries: .upTo(50)
    )
}
```

Or with the old ClientConnection API:

```swift
var config = ClientConnection.Configuration.default(...)
config.connectionBackoff = ConnectionBackoff(
        initialBackoff: 0.1,
        maximumBackoff: 0.25,
        multiplier: 1,
        jitter: 0,
        //minimumConnectionTimeout: 0.1,
        retries: .upTo(50)
)
config.callStartBehavior = .waitsForConnectivity
return ClientConnection(configuration: config)
```

I've tried various combinations of `retries`, `minimumConnectionTimeout`, `maxWaitTime`, and `callStartBehavior` to no avail.

### Expected behaviour

Describe what you expect to happen when the steps above are followed.

### Additional information

Logs with retry limit = 1:
```
2023-04-11T12:34:07-0700 debug grpc : poolmanager.id=ObjectIdentifier(0x0000600002dc8480) poolmanager.pools.conns_per_pool=1 poolmanager.pools.count=1 poolmanager.pools.waiters_per_pool=100 [GRPC] initializing connection pool manager
2023-04-11 12:34:07.806 OrbStack[2862:19592931] start rpc
2023-04-11T12:34:07-0700 debug grpc call opt : grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A pool.waiter.id=ObjectIdentifier(0x0000600001bb1fe0) pool.waiters.count=0 [GRPC] waiting for a connection to become available
2023-04-11T12:34:07-0700 trace grpc : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:07-0700 debug grpc : pool.reservations.capacity=0 pool.reservations.count=1 pool.reservations.load=inf pool.reservations.loadThreshold=0.9 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] stream reservation load factor greater than or equal to threshold, bringing up additional connection if available
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=message [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A request_part=end [GRPC] buffering request part
2023-04-11T12:34:07-0700 debug grpc call opt : grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B pool.waiter.id=ObjectIdentifier(0x0000600001bb1db0) pool.waiters.count=1 [GRPC] waiting for a connection to become available
2023-04-11T12:34:07-0700 trace grpc : pool.waiters.count=2 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=message [GRPC] buffering request part
2023-04-11T12:34:07-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B request_part=end [GRPC] buffering request part
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2023-04-11T12:34:07-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] creating a ClientBootstrap
2023-04-11T12:34:07-0700 debug grpc : connectivity_state=connecting grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] activating connection
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=active grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] deactivating connection
2023-04-11T12:34:08-0700 debug grpc : delay_secs=0.1 grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/0 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] scheduling connection attempt
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] creating a ClientBootstrap
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=connecting grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] activating connection
2023-04-11T12:34:08-0700 debug grpc : connectivity_state=active grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] deactivating connection
2023-04-11T12:34:08-0700 debug grpc : grpc_connection_id=A72D788A-D053-4D99-B714-58835717CF64/1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] shutting down connection




2023-04-11 12:34:12.968 OrbStack[2862:19592931] RPC returned error: deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
2023-04-11 12:34:12.968 OrbStack[2862:19592931] RPC returned error: deadlineExceeded(connectionError: Optional(unavailable (14): Unexpected connection drop))
2023-04-11T12:34:12-0700 trace grpc call opt : call_state=closing grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A [GRPC] failing buffered writes
2023-04-11T12:34:12-0700 trace grpc call opt : grpc_request_id=35EA3191-EF0A-4010-841B-91B173B6692A pool.waiter.id=ObjectIdentifier(0x0000600001bb1fe0) pool.waiters.count=1 [GRPC] timed out waiting for a connection
2023-04-11T12:34:12-0700 trace grpc call opt : call_state=closing grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B [GRPC] failing buffered writes
2023-04-11T12:34:12-0700 trace grpc call opt : grpc_request_id=EF7FFE9A-D02D-40E3-82A4-33A50C034E9B pool.waiter.id=ObjectIdentifier(0x0000600001bb1db0) pool.waiters.count=0 [GRPC] timed out waiting for a connection
2023-04-11T12:34:13-0700 debug grpc call opt : grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 pool.waiter.id=ObjectIdentifier(0x0000600001b994f0) pool.waiters.count=0 [GRPC] waiting for a connection to become available
2023-04-11T12:34:13-0700 trace grpc : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:13-0700 debug grpc call opt : grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 pool.waiter.id=ObjectIdentifier(0x0000600001b99450) pool.waiters.count=1 [GRPC] waiting for a connection to become available
2023-04-11T12:34:13-0700 trace grpc : pool.waiters.count=2 poolmanager.id=ObjectIdentifier(0x0000600002dc8480) [GRPC] enqueued connection waiter
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=message [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=C98F5343-E420-4DFF-ABB7-7FF4BFDF1EC5 request_part=end [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (0 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=metadata [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (1 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=message [GRPC] buffering request part
2023-04-11T12:34:13-0700 trace grpc call opt : call_state=awaitingTransport (2 parts buffered) grpc_request_id=A0FD37C0-5B52-4796-B708-A1E0BF62B861 request_part=end [GRPC] buffering request part
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection backoff with limited retries hangs indefinitely #1586

Describe the bug

To reproduce

Expected behaviour

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Connection backoff with limited retries hangs indefinitely #1586

Description

Describe the bug

To reproduce

Expected behaviour

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions