Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioned: Split watermark from Batch #836

Open
jordanrfrazier opened this issue Oct 30, 2023 · 0 comments
Open

Partitioned: Split watermark from Batch #836

jordanrfrazier opened this issue Oct 30, 2023 · 0 comments

Comments

@jordanrfrazier
Copy link
Collaborator

jordanrfrazier commented Oct 30, 2023

Summary

The current Batch passes around the watermark with optional data.

#[derive(Clone, PartialEq, Debug)]
pub struct Batch {
    /// The data associated with the batch.
    pub(crate) data: Option<BatchInfo>,

    /// An indication that the batch stream has completed up to the given time.
    /// Any rows in future batches on this stream must have a time strictly
    /// greater than this.
    pub up_to_time: RowTime,
}

Many evaluators are thus forced to reason about the presence / absence of the watermark and data without really needing to. A good refactoring to simplify logic / readability would be to separate the watermark from the batch, and only pass each where they are needed.

Possible Solution

#[must_use]
pub struct Watermark(RowTime);

pub struct WatermarkedBatch {
  batch: Option<Batch>,
  watermark: Watermark,
}

impl WatermarkedBatch {
  pub fn take(self) -> (Option<Batch>, Watermark) { ... }
}

So:

  1. The only way to get the batch is to call take
  2. When you call take you get the Watermark
  3. Once you have the Watermark you must use it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant