Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection pushdown optimization #4180

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

balavinaithirthan
Copy link
Contributor

@balavinaithirthan balavinaithirthan commented May 2, 2024

Here, we implement projection pushdown, allowing the select operator to get pushed up the pipeline and to allow for optimization in feather and parquet parse. In this PR, we modified every inherited optimize function to include a selection type, we allowed for feather and parquet to read only selected columns, and considered cases where select can be pushed up. This is a good starting point for the summarize and put operators because the framework for moving selection information has been implemented.
Semantically, an empty columnar selection means that no selection should be performed. Thus, special care must be taken when dealing with a null selection. Here, a null selection acts as a blocker. Further, when returning an optimize result, std::null opt, the selection is blocked. Thus, order_invariant and do_not_optimize are blockers

Currently, only slice, batch, and pass allow the selection to move upstream. More should be added carefully.

@balavinaithirthan balavinaithirthan force-pushed the topic/projectionPushdownOptimization branch from 95f199a to 9c32232 Compare May 2, 2024 15:07
@balavinaithirthan balavinaithirthan added performance Improvements or regressions of performance improvement An incremental enhancement of an existing feature labels May 22, 2024
libtenzir/src/pipeline.cpp Outdated Show resolved Hide resolved
libtenzir/src/pipeline.cpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/columnar_selection.hpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/selection.hpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/selection.hpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/selection.hpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/selection.hpp Outdated Show resolved Hide resolved
libtenzir/include/tenzir/plugin.hpp Show resolved Hide resolved
libtenzir/include/tenzir/pipeline.hpp Show resolved Hide resolved
libtenzir/builtins/formats/feather.cpp Outdated Show resolved Hide resolved
return std::nullopt;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove these braces added on auto save?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can discard type.cpp changes

(void)order;
return nullptr;
}
bool selection_optimized = false;
bool expression_optimized = false;
bool order_optimized = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add these to distinguish which type of optimization had occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement An incremental enhancement of an existing feature performance Improvements or regressions of performance
Projects
None yet
2 participants