Partitioned: Split watermark from Batch #836

jordanrfrazier · 2023-10-30T16:39:25Z

Summary

The current Batch passes around the watermark with optional data.

#[derive(Clone, PartialEq, Debug)]
pub struct Batch {
    /// The data associated with the batch.
    pub(crate) data: Option<BatchInfo>,

    /// An indication that the batch stream has completed up to the given time.
    /// Any rows in future batches on this stream must have a time strictly
    /// greater than this.
    pub up_to_time: RowTime,
}

Many evaluators are thus forced to reason about the presence / absence of the watermark and data without really needing to. A good refactoring to simplify logic / readability would be to separate the watermark from the batch, and only pass each where they are needed.

Possible Solution

#[must_use]
pub struct Watermark(RowTime);

pub struct WatermarkedBatch {
  batch: Option<Batch>,
  watermark: Watermark,
}

impl WatermarkedBatch {
  pub fn take(self) -> (Option<Batch>, Watermark) { ... }
}

So:

The only way to get the batch is to call take
When you call take you get the Watermark
Once you have the Watermark you must use it

The text was updated successfully, but these errors were encountered:

bjchambers mentioned this issue Oct 30, 2023

feat: partitioned merge pipeline #828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitioned: Split watermark from Batch #836

Partitioned: Split watermark from Batch #836

jordanrfrazier commented Oct 30, 2023 •

edited

Loading

Partitioned: Split watermark from Batch #836

Partitioned: Split watermark from Batch #836

Comments

jordanrfrazier commented Oct 30, 2023 • edited Loading

Summary

Possible Solution

jordanrfrazier commented Oct 30, 2023 •

edited

Loading