Watch API. blocked forever even with WithRequireLeader option. #12534

timestee · 2020-12-09T12:52:53Z

Perhaps duplicated with this issue : #12022 but even with WithRequireLeader option.

When the connection is disconnected, due to a restart of the etcd server, sometimes reading from the watch channel is blocked forever even with WithRequireLeader option, Just got these error log：

{"level":"warn","ts":"2020-12-09T20:41:09.730+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-df29e0f0-3bb7-4d66-a7ba-d2b2e4850b07/127.0.0.1:12379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:12379: connect: connection refused\""}
{"level":"warn","ts":"2020-12-09T20:46:14.739+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-df29e0f0-3bb7-4d66-a7ba-d2b2e4850b07/127.0.0.1:12379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:32379: connect: connection refused\""}

Code snippet：

// wc is a valid Watcher
go func() {
	defer func() { _ = wc.Close() }()
	defer close(respCh)

	// Push the current value through the channel.
	respCh <- pairs

	rev++
	ctx := context.Background()
	ctx = clientv3.WithRequireLeader(ctx)
	watchCh := wc.Watch(ctx, v.normalize(directory), clientv3.WithPrefix(), clientv3.WithRev(rev))
	for {
		// Check if the watch was stopped by the caller
		select {
		case <-stopCh:
			return
		case <-watchCh:
			_, pairs, err := v.list(directory, opts)
			if err != nil {
				return
			}
			respCh <- pairs
		}
	}
}()

golang client version : v3.3.0-rc.0.0.20200720071305-89da79188f73
server version : etcd Version: 3.4.13

The text was updated successfully, but these errors were encountered:

agargi · 2020-12-27T14:39:26Z

@timestee Whether a leader exists in a cluster is only checked at the time of creating a Watch. This check is true for any gRPC call given this check is done only in Interceptor and not afterwards. IIUC, what you are asking can be a future enhancement (but I assume this needs to be done in a way where it does not change existing behavior but provides a new capability to cancel watch if there is no leader).

kubernetes/kubernetes#89488

timestee · 2020-12-28T23:21:15Z

@agargi you got my point, etcd should cancel watch or return an error if leader lost, otherwise, when the leader lost, the service provider will keep alive failed, what should the service provider do?

Should the service provider shutdown itself? But the watcher can not be notified that the leader lost , it just blocked.
Should the service provider ignore the keep alive failed error? But when the etcd server comes back, the watcher will be notified that all registered service lost because of TTL. The service provider should auto register again now? a bit messy i think.

embano1 · 2021-01-18T19:53:25Z

Went through the same, here's my conclusion from a while ago: kubernetes/kubernetes#89488 (comment)

stale · 2021-04-18T22:09:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

xiang90 · 2021-04-19T01:22:27Z

It is a nice to have feature. Probably etcd should check if the server maintains leadership for long running requests.

stale · 2021-07-18T05:38:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

346999452 · 2024-03-20T05:46:27Z

the client code can cause this result in extreme scenarios, and I have reported it as a bug:
#17611

stale bot added the stale label Apr 18, 2021

xiang90 added the type/feature label Apr 19, 2021

stale bot removed the stale label Apr 19, 2021

stale bot added the stale label Jul 18, 2021

stale bot closed this as completed Aug 8, 2021

346999452 mentioned this issue Mar 19, 2024

Watch API. blocked forever even with WithRequireLeader option #17611

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch API. blocked forever even with WithRequireLeader option. #12534

Watch API. blocked forever even with WithRequireLeader option. #12534

timestee commented Dec 9, 2020

agargi commented Dec 27, 2020

timestee commented Dec 28, 2020

embano1 commented Jan 18, 2021

stale bot commented Apr 18, 2021

xiang90 commented Apr 19, 2021

stale bot commented Jul 18, 2021

346999452 commented Mar 20, 2024

Watch API. blocked forever even with WithRequireLeader option. #12534

Watch API. blocked forever even with WithRequireLeader option. #12534

Comments

timestee commented Dec 9, 2020

agargi commented Dec 27, 2020

timestee commented Dec 28, 2020

embano1 commented Jan 18, 2021

stale bot commented Apr 18, 2021

xiang90 commented Apr 19, 2021

stale bot commented Jul 18, 2021

346999452 commented Mar 20, 2024