Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve page index filter for parquet scan with compound predicate containing disjunctions across columns #54475

Open
shaeqahmed opened this issue Dec 29, 2024 · 0 comments
Labels
type/enhancement Make an enhancement to StarRocks

Comments

@shaeqahmed
Copy link

@Smith-Cruise maybe i dont understand correctly, but won't using the combined row ranges from the compound predicate tree zone map evaluation that unions and intersects the ranges across columns (based on AND/OR), for selecting the offset index for each individual column (narrows down which pages need to be read per column) result in more IO than is necessary, since this calculation should only be narrowed down by conjunctions from other column predicates and never widened because of a disjunction from a different column. e.g. for compound predicate column_A = "rare" OR column_B = "common" the range / pages to read for A should not be increased because of the inclusion of a non selective predicate on column B.

Originally posted by @shaeqahmed in ab8abca


Enhancement

seems like the parquet advanced zonemap filter wont fully narrow down using the offset index properly if the compound predicate also contains an OR with a subpredicate on a column that matches most of the rows

@shaeqahmed shaeqahmed added the type/enhancement Make an enhancement to StarRocks label Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Make an enhancement to StarRocks
Projects
None yet
Development

No branches or pull requests

1 participant