Watch response revision can be lower than the required starting revision #19487

Congyu-Liu · 2025-02-26T04:22:41Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

Hi developers! When testing etcd, we found a bug that can cause a watch response revision to be less than the required starting revision (e.g., watch key --rev 4 returns a response with revision=3).

What did you expect to happen?

The watch response revision should not be less than the starting revision.

How can we reproduce it (as minimally and precisely as possible)?

The following steps are required to trigger this bug on a follower node:

The node receives the watch request with a starting revision in future (i.e., greater than the current revision)
It restores the snapshot from the leader node ((s *watchableStore) Restore(b backend.Backend) is called), e.g., after a network partition; at this point, the watch revision is still greater than the current revision
syncWatchers() is called
run some put()s

It seems that when restoring the snapshot, the node will move the synced watchers to the unsynced group:

etcd/server/mvcc/watchable_store.go

Lines 204 to 207 in 5bca08e

    
           for wa := range s.synced.watchers { 
        
           	wa.restore = true 
        
           	s.unsynced.add(wa) 
        
           }

Later, when syncWatchers() is called, it decreases the watcher's minRev to curRev + 1, and then moves the watcher back to synced group; then the future put()s will be notified even if their revisions are less than the original watch revision:

etcd/server/mvcc/watchable_store.go

Lines 368 to 382 in 5bca08e

    
           for w := range wg.watchers { 
        
           	if w.minRev < compactionRev { 
        
           		// Skip the watcher that failed to send compacted watch response due to w.ch is full. 
        
           		// Next retry of syncWatchers would try to resend the compacted watch response to w.ch 
        
           		continue 
        
           	} 
        
           	w.minRev = curRev + 1 
        
           	eb, ok := wb[w] 
        
           	if !ok { 
        
           		// bring un-notified watcher to synced 
        
           		s.synced.add(w) 
        
           		s.unsynced.delete(w) 
        
           		continue 
        
           	}

Hope this could be helpful.

Anything else we need to know?

No response

Etcd version (please run commands below)

3.5.18

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

serathius · 2025-02-26T10:01:50Z

Don't think we provide any guarantees about RV in watch response, as long as it's not a progress notification. They are just local etcd RV when the response was created. As responses are queued up to 100, they can be arbitrarily stale.

I don't think the current semantic of RV in watch response is useful, and fixing the issue you described would not make it any more useful.

Congyu-Liu · 2025-02-26T14:42:21Z

Hi @serathius! Thank you for response.

What we are observing is that the watch request does not respect the given start revision, as it returns responses that have lower revision than the start revision. I think this watch start revision semantic is clearly defined in the API documentation

Start_Revision - An optional revision for where to inclusively begin watching. If not given, it will stream events following the revision of the watch creation response header revision. The entire available event history can be watched starting from the last compaction revision.

and the robustness model:

etcd/tests/robustness/model/replay.go

Lines 56 to 57 in 6ecb6fb

    
           if e.Revision < watch.Revision || !e.Match(watch) { 
        
           	continue

I am not sure if this is related to staleness (or maybe I miss something).

Congyu-Liu · 2025-02-27T01:14:15Z

@serathius Below are the steps to reproduce. In summary, we need to first send a watch request with a future starting revision to a follower, then trigger the leader to send a snapshot to this node, finally run some puts.

Set a 3-node cluster, say node A (leader), B (follower), C (follower); --snapshot-count and --experimental-snapshot-catchup-entries is set to 1 to increase the chance of transferring snapshot.
Send watch k --prefix --rev 1000 to B
Partition A <-> B; then run some put requests and move leader to C. This should effectively trigger C to send snapshots to B which is lagging behind.
Run put requests with key prefix equal to k. We should then see the responses from the previous watch request, even though the current revision is less than 1000 (the expected behavior is that we should see nothing before the revision reaches 1000).

Hope it helps.

Congyu-Liu added the type/bug label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch response revision can be lower than the required starting revision #19487

Watch response revision can be lower than the required starting revision #19487

Congyu-Liu commented Feb 26, 2025

serathius commented Feb 26, 2025

Congyu-Liu commented Feb 26, 2025

Congyu-Liu commented Feb 27, 2025 •

edited

Loading

Watch response revision can be lower than the required starting revision #19487

Watch response revision can be lower than the required starting revision #19487

Comments

Congyu-Liu commented Feb 26, 2025

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Feb 26, 2025

Congyu-Liu commented Feb 26, 2025

Congyu-Liu commented Feb 27, 2025 • edited Loading

Congyu-Liu commented Feb 27, 2025 •

edited

Loading