-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented simplify
for the starts_with
function to convert it into a LIKE expression.
#14119
Conversation
…nto a LIKE expression, enabling predicate pruning optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jatin510 -- this looks great
Can you please add some tests for this?
Specifically, I think you can use an explain test in sqllogictests, something like the following. The EXPLAIN
should show LIKE
being used in the physical plan
> create table t(x varchar) as values ('foo'), ('bar');
0 row(s) fetched.
Elapsed 0.013 seconds.
> explain select starts_with(x, 'fo') from t;
+---------------+----------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------+
| logical_plan | Projection: starts_with(t.x, Utf8("fo")) |
| | TableScan: t projection=[x] |
| physical_plan | ProjectionExec: expr=[starts_with(x@0, fo) as starts_with(t.x,Utf8("fo"))] |
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+----------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.003 seconds.
) -> Result<ExprSimplifyResult> { | ||
if let Expr::Literal(ScalarValue::Utf8(Some(pattern))) = &args[1] { | ||
// Convert starts_with (col, 'prefix') to col LIKE 'prefix%' | ||
let like_pattern = format!("{}%", pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need to escape any %
that appears in the pattern (or avoid doing this replacement when such a pattern exists)
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
…ype. and updated the coresponding test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -344,7 +344,7 @@ EXPLAIN SELECT | |||
FROM test; | |||
---- | |||
logical_plan | |||
01)Projection: starts_with(test.column1_utf8view, Utf8View("äöüß")) AS c1, starts_with(test.column1_utf8view, Utf8View("")) AS c2, starts_with(test.column1_utf8view, Utf8View(NULL)) AS c3, starts_with(Utf8View(NULL), test.column1_utf8view) AS c4 | |||
01)Projection: test.column1_utf8view LIKE Utf8View("äöüß%") AS c1, CASE test.column1_utf8view IS NOT NULL WHEN Boolean(true) THEN Boolean(true) END AS c2, starts_with(test.column1_utf8view, Utf8View(NULL)) AS c3, starts_with(Utf8View(NULL), test.column1_utf8view) AS c4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually pretty cool -- it figured out that STARTS_WITH(column1_utf8view, '') as c2,
is true if column1_utf8view
is NOT NULL
_info: &dyn SimplifyInfo, | ||
) -> Result<ExprSimplifyResult> { | ||
if let Expr::Literal(scalar_value) = &args[1] { | ||
// Convert starts_with(col, 'prefix') to col LIKE 'prefix%' with proper escaping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked the escaping logic and I think this looks good to me.
This is really sweet. You an see it working to prune parquet files here:
Specifically the predicate I will also add a test to this PR demonstrating this too |
01)CoalesceBatchesExec: target_batch_size=8192 | ||
02)--FilterExec: column1@0 LIKE f% | ||
03)----RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1 | ||
04)------ParquetExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/foo.parquet]]}, projection=[column1], predicate=column1@0 LIKE f%, pruning_predicate=column1_null_count@2 != column1_row_count@3 AND column1_min@0 <= g AND f <= column1_max@1, required_guarantees=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so cool!
This looks great! |
Implemented
simplify
for thestarts_with
function to convert it into aLIKE
expressionWhich issue does this PR close?
Closes #14027.
Rationale for this change
Using 'LIKE' expr for 'starts_with' to enable predicate pruning.
What changes are included in this PR?
Implemented
simplify
for thestarts_with
function to convert it into a LIKE expression, enabling predicate pruning optimization.Are these changes tested?
Ues
Are there any user-facing changes?
No