-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(native): Add TextReader registration #25995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR integrates a Velox TextReader into the native execution engine by introducing a new configuration flag, registering the reader in server startup, updating build/link settings, and extending the existing test suite and Maven configurations to cover the TEXTFILE format. Entity relationship diagram for TEXTFILE format supporterDiagram
TABLES ||--o{ TEXTFILE : supports
TEXTFILE {
string path
string format
}
Class diagram for updated SystemConfig with TextReaderWriter flagclassDiagram
class SystemConfig {
+bool textReaderWriterEnabled() const
-bool textWriterEnabled() const
<<static>> string_view kTextReaderWriterEnabled
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
1ffcec3 to
8e18145
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider splitting the combined
text-reader-writer-enabledflag into two separate flags (text-reader-enabledandtext-writer-enabled) so you can roll out reader and writer independently and keep the semantics clear. - Since this PR relies on specific Velox fixes, please update the Velox submodule reference or add a version guard in your CMake configuration to ensure the required patches are pulled in before building.
- You’ve added several near-duplicate test classes for TEXTFILE in TPCDS/TPCH—consider using a TestNG data provider or shared helper to parameterize storage formats and reduce boilerplate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider splitting the combined `text-reader-writer-enabled` flag into two separate flags (`text-reader-enabled` and `text-writer-enabled`) so you can roll out reader and writer independently and keep the semantics clear.
- Since this PR relies on specific Velox fixes, please update the Velox submodule reference or add a version guard in your CMake configuration to ensure the required patches are pulled in before building.
- You’ve added several near-duplicate test classes for TEXTFILE in TPCDS/TPCH—consider using a TestNG data provider or shared helper to parameterize storage formats and reduce boilerplate.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
8e18145 to
11c24bd
Compare
9dc3789 to
95a7b8e
Compare
|
Should this be mentioned in the Presto doc? Perhaps in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/features.rst. |
|
@steveburnett Thanks for the review. |
|
@aditi-pandit @kewang1024 @zacw7 |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xin-zhang2. Have one important comment about the config change, else the rest are minor changes.
| static constexpr std::string_view kTextWriterEnabled{"text-writer-enabled"}; | ||
| // Add to temporarily help with gradual rollout for text reader and writer | ||
| // TODO: remove once text reader and writer are fully rolled out | ||
| static constexpr std::string_view kTextReaderWriterEnabled{"text-reader-writer-enabled"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xin-zhang2 : Would expect that Meta sets this value in clusters.. So we don't rename configs like that as its not backward compatible. Lets leave the config the same, but just document that its for both reader and writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit Thanks for the review. I‘ve changed the code to leave the config name text-writer-enabled for backward compatibility, and use kTextReaderWriterEnabled for the config variable name as it matches the code logic better. Does this sound good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitted it into two separate properties for semantic clarity, finer-grained control and backward compatibility.
| throws Exception | ||
| { | ||
| return PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder() | ||
| .setStorageFormat("TEXTFILE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set this.storageFormat and use setStorageFormat in the 2 different QueryRunners ? Lets make them consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both QueryRunners use setStorageFormat to set the hive.storage-format config.
Setting this.storageFormat is to override the default storageFormat value in AbstractTestNativeTpcdsQueries, which uses this field to construct appropriate CREATE TABLE statements and handle tests differently based on storage formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move these tests to presto-native-tests module to avoid inflating presto-native-execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the tpch and tpcds tests are currently in presto-native-execution module. Are we planning to move them to presto-native-tests? If so, I think it might be better to do that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xin-zhang2 Tpch and tpcds tests with Parquet are a smoke test for native engine, so they will continue to be a part of presto-native-execution. These tests are mostly for text reading abilities. And that is not a core feature as such. So hence was the recommendation to move to presto-native-tests which is the fuil suite of presto native tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the tpch and tpchds query tests for TextReader to presto-native-tests.
95a7b8e to
070950a
Compare
That makes sense, thanks! Please add documentation for this config property to the Presto Java config property page. https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst |
|
@steveburnett Thanks. |
Documentation should describe the current state of the software. A PR adding a config property to Presto should include documentation for the new property, as described in All new language features, new functions, session and config properties, and major features have documentation added. (from Designing Your Code in CONTRIBUTING.md. ) If the new config property is a Presto C++ config, then it should be documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/features.rst. If and when the property is removed from Presto in the future, the documentation can be deleted at that time. |
070950a to
49fc64e
Compare
|
@steveburnett |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good.
Thanks for the doc!
zacw7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we rename the title something like Add TextReader enablement config or Add TextReader registration to be more specific?
|
@zacw7 Thanks for the review. I've renamed the title to Add TextReader registration. |
Thanks for the retitle. What are the test failures? seem to be unrelated. |
|
@zacw7 Yes, the failures were unrelated. I re-ran the jobs and they are clean now. |
| "web_sales", "web_site"}; | ||
| Map<String, Long> deletedRowsMap = new HashMap<>(); | ||
|
|
||
| public void setStorageFormat(String storageFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think a public method is good for this. storageFormat should be protected scope and just assigned in test setup, similar to TestPrestoNativeTpcdsQueriesOrcUsingThrift or in the test class constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I've changed storageFormat to protected scope and removed the setter.
...ts/src/test/java/com/facebook/presto/nativetests/TestTextReaderWithTpchQueriesUsingJSON.java
Show resolved
Hide resolved
...ts/src/test/java/com/facebook/presto/nativetests/TestTextReaderWithTpchQueriesUsingJSON.java
Outdated
Show resolved
Hide resolved
...src/test/java/com/facebook/presto/nativetests/TestTextReaderWithTpcdsQueriesUsingThrift.java
Outdated
Show resolved
Hide resolved
49fc64e to
215a349
Compare
215a349 to
4ff5204
Compare
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xin-zhang2
|
@tdcmeehan It seems this PR requires a review from presto/committers. Can you please help take a look? Thanks! |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Add TextReader registration.
It's a continuation of #25626.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.