-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented simplify
for the starts_with
function to convert it into a LIKE expression.
#14119
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a373693
Implemented `simplify` for the `starts_with` function to convert it i…
jatin510 b4c1f37
fix: escape special characters in starts_with to LIKE conversion
jatin510 90b0cad
updated simply function to handle utf8, largeutf8 and utf8view data t…
jatin510 66dfac3
Add some more tests
alamb 4fa3f17
Merge remote-tracking branch 'apache/main' into feat/simplify-starts_…
alamb 5829d46
Add pruning test
alamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -598,3 +598,34 @@ drop table cpu; | |
|
||
statement ok | ||
drop table cpu_parquet; | ||
|
||
# Test for parquet predicate pruning with `starts_with` function | ||
query I | ||
copy (values ('foo'), ('bar'), ('baz')) TO 'test_files/scratch/parquet/foo.parquet' | ||
---- | ||
3 | ||
|
||
statement ok | ||
create external table foo | ||
stored as parquet | ||
location 'test_files/scratch/parquet/foo.parquet'; | ||
|
||
|
||
# Expect that the pruning predicate contain a comparison on the min/max value of `column1): | ||
# column1_min@0 <= g AND f <= column1_max@1` | ||
# (the starts_with function is not supported in the parquet predicate pruning but DataFusion rewrites | ||
# it to a like which is then handled by the PruningPredicate) | ||
query TT | ||
explain select * from foo where starts_with(column1, 'f'); | ||
---- | ||
logical_plan | ||
01)Filter: foo.column1 LIKE Utf8View("f%") | ||
02)--TableScan: foo projection=[column1], partial_filters=[foo.column1 LIKE Utf8View("f%")] | ||
physical_plan | ||
01)CoalesceBatchesExec: target_batch_size=8192 | ||
02)--FilterExec: column1@0 LIKE f% | ||
03)----RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1 | ||
04)------ParquetExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/foo.parquet]]}, projection=[column1], predicate=column1@0 LIKE f%, pruning_predicate=column1_null_count@2 != column1_row_count@3 AND column1_min@0 <= g AND f <= column1_max@1, required_guarantees=[] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is so cool! |
||
|
||
statement ok | ||
drop table foo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -344,9 +344,51 @@ EXPLAIN SELECT | |
FROM test; | ||
---- | ||
logical_plan | ||
01)Projection: starts_with(test.column1_utf8view, Utf8View("äöüß")) AS c1, starts_with(test.column1_utf8view, Utf8View("")) AS c2, starts_with(test.column1_utf8view, Utf8View(NULL)) AS c3, starts_with(Utf8View(NULL), test.column1_utf8view) AS c4 | ||
01)Projection: test.column1_utf8view LIKE Utf8View("äöüß%") AS c1, CASE test.column1_utf8view IS NOT NULL WHEN Boolean(true) THEN Boolean(true) END AS c2, starts_with(test.column1_utf8view, Utf8View(NULL)) AS c3, starts_with(Utf8View(NULL), test.column1_utf8view) AS c4 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is actually pretty cool -- it figured out that |
||
02)--TableScan: test projection=[column1_utf8view] | ||
|
||
## Test STARTS_WITH is rewitten to LIKE when the pattern is a constant | ||
query TT | ||
EXPLAIN SELECT | ||
STARTS_WITH(column1_utf8, 'foo%') as c1, | ||
STARTS_WITH(column1_large_utf8, 'foo%') as c2, | ||
STARTS_WITH(column1_utf8view, 'foo%') as c3, | ||
STARTS_WITH(column1_utf8, 'f_o') as c4, | ||
STARTS_WITH(column1_large_utf8, 'f_o') as c5, | ||
STARTS_WITH(column1_utf8view, 'f_o') as c6 | ||
FROM test; | ||
---- | ||
logical_plan | ||
01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f_o%") AS c5, test.column1_utf8view LIKE Utf8View("f_o%") AS c6 | ||
02)--TableScan: test projection=[column1_utf8, column1_large_utf8, column1_utf8view] | ||
|
||
## Test STARTS_WITH works with column arguments | ||
query TT | ||
EXPLAIN SELECT | ||
STARTS_WITH(column1_utf8, substr(column1_utf8, 1, 2)) as c1, | ||
STARTS_WITH(column1_large_utf8, substr(column1_large_utf8, 1, 2)) as c2, | ||
STARTS_WITH(column1_utf8view, substr(column1_utf8view, 1, 2)) as c3 | ||
FROM test; | ||
---- | ||
logical_plan | ||
01)Projection: starts_with(test.column1_utf8, substr(test.column1_utf8, Int64(1), Int64(2))) AS c1, starts_with(test.column1_large_utf8, substr(test.column1_large_utf8, Int64(1), Int64(2))) AS c2, starts_with(test.column1_utf8view, substr(test.column1_utf8view, Int64(1), Int64(2))) AS c3 | ||
02)--TableScan: test projection=[column1_utf8, column1_large_utf8, column1_utf8view] | ||
|
||
query BBB | ||
SELECT | ||
STARTS_WITH(column1_utf8, substr(column1_utf8, 1, 2)) as c1, | ||
STARTS_WITH(column1_large_utf8, substr(column1_large_utf8, 1, 2)) as c2, | ||
STARTS_WITH(column1_utf8view, substr(column1_utf8view, 1, 2)) as c3 | ||
FROM test; | ||
---- | ||
true true true | ||
true true true | ||
true true true | ||
true true true | ||
NULL NULL NULL | ||
|
||
|
||
# Ensure that INIT cap works with utf8view | ||
query TT | ||
EXPLAIN SELECT | ||
INITCAP(column1_utf8view) as c | ||
|
@@ -887,7 +929,7 @@ EXPLAIN SELECT | |
FROM test; | ||
---- | ||
logical_plan | ||
01)Projection: starts_with(test.column1_utf8view, Utf8View("foo")) AS c, starts_with(test.column1_utf8view, test.column2_utf8view) AS c2 | ||
01)Projection: test.column1_utf8view LIKE Utf8View("foo%") AS c, starts_with(test.column1_utf8view, test.column2_utf8view) AS c2 | ||
02)--TableScan: test projection=[column1_utf8view, column2_utf8view] | ||
|
||
## Ensure no casts for TRANSLATE | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked the escaping logic and I think this looks good to me.