Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pruning on starts_with #14027

Closed
alamb opened this issue Jan 6, 2025 · 3 comments · Fixed by #14119
Closed

Support pruning on starts_with #14027

alamb opened this issue Jan 6, 2025 · 3 comments · Fixed by #14119
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jan 6, 2025

Is your feature request related to a problem or challenge?

@adriangb implemented PruningPredicate support for prefix matching LIKE / NOT LIKE in

However, it isn't currently supported for the starts_with function

Describe the solution you'd like

I would like predicate pruning to happen for the starts_with function as well

So queries like

select * from my_file where starts_with(col, 'http://')

Could also use starts_with to prune parquet files

Describe alternatives you've considered

The challenge at the moment is that PruningPredicate can't refer directly to the function implementations

Given how optimized LIKE is one possible solution would be to change starts_with so it didn't just call an arrow kernel, but instead was rewritten

https://github.com/apache/datafusion/blob/main/datafusion/functions/src/string/starts_with.rs

So for example, it could be rewritten into Expr::Like by implementing simplity:

https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.simplify

We could do something similar with ends_with as well

Additional context

No response

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Jan 6, 2025
@alamb
Copy link
Contributor Author

alamb commented Jan 6, 2025

I think this is a good first issue as rewriting a function should be straightforward and doesn't require indepth knowledge of the rest of the engine

@jatin510
Copy link
Contributor

jatin510 commented Jan 6, 2025

take

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2025

I think the rewrite would look something like

select * from my_file where starts_with(col, 'http://')

Rewritten to the equivalent of

select * from my_file where col LIKE 'http://%'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants