Skip to content
This repository has been archived by the owner on Dec 29, 2021. It is now read-only.

Grouping and Aggregation Expressions #23

Open
nevi-me opened this issue Nov 27, 2019 · 1 comment
Open

Grouping and Aggregation Expressions #23

nevi-me opened this issue Nov 27, 2019 · 1 comment
Labels
df-lazy-ops Lazy operations and their evaluation

Comments

@nevi-me
Copy link
Owner

nevi-me commented Nov 27, 2019

In order to implement aggregations, we need to be able to group data. Like joins, the task of grouping probably belongs upstream, but we should be able to define how to group data.

The LazyFrame might need some state (whether it's grouped or not) to prevent 'normal' calculations when it's in a grouped state. I don't want to implement a GroupedLazyFrame because we rely on mutating the &mut LazyFrame to add on computations.

An aggregation should ideally take in multiple aggregations.
A grouping should take in multiple columns, with columns that aren't grouped or aggregated, getting dropped.

@nevi-me nevi-me added bug Something isn't working df-lazy-ops Lazy operations and their evaluation and removed bug Something isn't working labels Nov 27, 2019
@nevi-me
Copy link
Owner Author

nevi-me commented Nov 28, 2019

Data is grouped in order to be aggregated, so perhaps it might be better not to create an intermediate grouped data structure, but instead take the grouping and aggregations at the same time.

Something like:

impl LazyFrame {
  fn aggregate(grouping: Vec<_>, aggregates: Vec<_>) -> Self;
}

The grouping can be Vec<&str>, but aggregates should be more expressive, as either functions that implement some aggregation trait, or an enum if we support a finite list of aggregations.

nevi-me added a commit that referenced this issue Feb 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
df-lazy-ops Lazy operations and their evaluation
Projects
None yet
Development

No branches or pull requests

1 participant