Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statistical aggregation functions #4208

Merged
merged 2 commits into from May 17, 2024

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented May 11, 2024

I needed mean for an ingress/egress chart and was surprised we didn't have that yet. This was easy enough to implement, so I just went ahead and did it.

Based on the initial review feedback this also adds stddev, variance, approximate_median, and collect.

@dominiklohmann dominiklohmann added feature New functionality operator Source, transformation, and sink labels May 11, 2024
@dominiklohmann dominiklohmann force-pushed the topic/mean-aggregation-function branch from c5996ca to 96d4b79 Compare May 11, 2024 12:38
@mavam
Copy link
Member

mavam commented May 11, 2024

Could you also add something for median and standard deviation, assuming it's as easy? They often go together.

@dominiklohmann
Copy link
Member Author

Could you also add something for median and standard deviation, assuming it's as easy? They often go together.

Bit harder to do, but certainly not a problem. If I find some more time to hack on code this weekend I'll consider it, otherwise feel free to just use mean as a template.

@dominiklohmann dominiklohmann force-pushed the topic/mean-aggregation-function branch from 96d4b79 to 3fcef64 Compare May 11, 2024 15:01
@dominiklohmann dominiklohmann changed the title Add a mean aggregation function Add statistical aggregation functions May 11, 2024
@dominiklohmann dominiklohmann requested a review from mavam May 11, 2024 15:01
@dominiklohmann
Copy link
Member Author

Could you also add something for median and standard deviation, assuming it's as easy? They often go together.

Done. Mind giving this another review given that there were major changes since Jannis' initial approval?

I needed `mean` for an ingress/egress chart and was surprised we didn't
have that yet. This was easy enough to implement, so I just went ahead
and did it.

Based on the initial review feedback this also adds `stddev`,
`variance`, `approximate_median`, and `collect`.
@dominiklohmann dominiklohmann force-pushed the topic/mean-aggregation-function branch from 3fcef64 to 1ef6c4c Compare May 11, 2024 15:14
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Approving modulo missing integration tests.

I'm a bit worried about the error bounds of using t-digest for incremental median computation, but having even an approximate value is already very helpful. IIRC the error bounds do not hold upon when composing t-digest sketches, but that's not an issue in our case.

@dominiklohmann
Copy link
Member Author

Approving modulo missing integration tests.

I'll add some before merging. For now I mostly verified that this works as expected by manually comparing values. That's why I implemented collect as well.

Co-authored-by: Matthias Vallentin <[email protected]>
@dominiklohmann dominiklohmann merged commit 07a05aa into main May 17, 2024
19 of 21 checks passed
@dominiklohmann dominiklohmann deleted the topic/mean-aggregation-function branch May 17, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality operator Source, transformation, and sink
Projects
None yet
3 participants