Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add windowing and aggregation #18
base: main
Are you sure you want to change the base?
WIP: Add windowing and aggregation #18
Changes from 1 commit
b111d37
d2fc3a4
8650bb9
197695e
578ba7c
9894dd7
9c97f19
7670446
f835c46
383d8da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the group by key may not be mandatory. Let's say you want to simulate something like
Basically accumulate data in a counter or any other data sketch without grouping in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily.
Windowing is needed to produce intermediate results on an unbounded stream. So every time this step emits results we need windowing. But there are stateful primitives that do not emit results. Think about the counter in postgres. We just count and periodically store in postgres.
This about how you would model this https://www.notion.so/sentry/Data-workers-1808b10e4b5d806983f3f0e22a2c2758?pvs=4#1858b10e4b5d80c98207e29786a1317a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, we'll support that use case (for now) using windowed aggregation + sink.
So windowing will be required in this iteration of the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing a watermark content. Basically the windowing algorithm decides the window semantics, but the user has to specify how/when to close a window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added by allowing users to configure windows and triggers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point. Windows that close by event time are inherently tied to the watermarking strategy.
This strategy can be set directly on the source especially if we're using something like Kafka. We can also allow users to configure their own as part of a step like Reduce, though I believe that's not recommended because that would mean a new watermarked stream is being produced as part of the step (at least for Flink).
WDYT about how we should configure watermarking strategy? I feel that it makes sense to set it up directly on the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the storage hidden by the accumulator at times ?
Think about these cases:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same discussion we had related to #18 (comment)
I can leave a TODO to come back to this in a next PR since this once is getting large
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sense on which accumulators we will provide out of the box ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I think that might be a good idea for next PR. Leaving a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint. Try to make this generic rather than have
Any
as type everywhere.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do, right now just getting things working with Flink