Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consuming: reset to nearest if we receive OOOR while fetching #628

Merged
merged 1 commit into from
Dec 6, 2023
Merged

Conversation

twmb
Copy link
Owner

@twmb twmb commented Dec 3, 2023

If we receive OOOR while fetching after a fetch was previously successful, something odd happened in the broker. Either what we were consuming was truncated underfoot, which is normal and expected (for very slow consumers), or data loss occurred without a leadership transfer. We will reset to the nearest offset after our prior consumed offset (by time!) because, well, that's what's most valid: we previously had a valid offset, and now it is invalid.

Closes #621.

If we receive OOOR while fetching after a fetch was previously
successful, something odd happened in the broker. Either what we were
consuming was truncated underfoot, which is normal and expected (for
very slow consumers), or data loss occurred without a leadership
transfer. We will reset to the nearest offset after our prior consumed
offset (by time!) because, well, that's what's most valid: we
previously had a valid offset, and now it is invalid.

Closes #621.
@twmb twmb mentioned this pull request Dec 3, 2023
@twmb twmb merged commit 3134cb2 into master Dec 6, 2023
6 checks passed
@twmb twmb deleted the 621 branch December 6, 2023 05:34
@nvartolomei
Copy link

@twmb may I ask why we prefer resetting by time? Say the consumer did consume offset=10, now it does a fetch offset and gets offset_out_of_range error with log start offset = 15, why it doesn't just move to offset 15 and instead has to do the timequery dance?

@twmb
Copy link
Owner Author

twmb commented Apr 30, 2024

@nvartolomei this dance is to address data loss at the end, not the start:

  • Broker has 11, 12, 13, 14, 15
  • Client consumes 11, 12
  • Broker loses 13, 14, 15
  • Client asks for 13
  • Broker returns OOOR
  • Second client produces 13, 14, 15

The best to reset to is the new 13, which is, by time, the first thing after 12. Any other offset-reset scheme can only pick the start or end; picking the start is a whole-topic-reconsume we want to avoid, picking the end misses the new 13, 14, 15.

@nvartolomei
Copy link

nvartolomei commented Apr 30, 2024 via email

@twmb
Copy link
Owner Author

twmb commented May 7, 2024

Correct, if there are no bugs, this code should not be necessary. This PR was added as a bit of a paranoia guard if a broker was buggy and lost data but did NOT bump the epoch. There was worry that this happened in RP once, which caused a client to reset to the beginning. Now, at least, the client will not reset to the beginning even if the broker has a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: KIP-842
2 participants