Skip to content

Conversation

@xin-zhang2
Copy link
Contributor

@xin-zhang2 xin-zhang2 commented Sep 9, 2025

Description

Add TextReader registration.
It's a continuation of #25626.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
*  Add TextReader support for tables in TEXTFILE format. 

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 9, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 9, 2025

Reviewer's Guide

This PR integrates a Velox TextReader into the native execution engine by introducing a new configuration flag, registering the reader in server startup, updating build/link settings, and extending the existing test suite and Maven configurations to cover the TEXTFILE format.

Entity relationship diagram for TEXTFILE format support

erDiagram
TABLES ||--o{ TEXTFILE : supports
TEXTFILE {
  string path
  string format
}
Loading

Class diagram for updated SystemConfig with TextReaderWriter flag

classDiagram
class SystemConfig {
  +bool textReaderWriterEnabled() const
  -bool textWriterEnabled() const
  <<static>> string_view kTextReaderWriterEnabled
}
Loading

File-Level Changes

Change Details Files
Enable and register Velox TextReader in the C++ server
  • Replaced textWriterEnabled flag with combined textReaderWriterEnabled in system config
  • Conditionally register/unregister text reader factory in PrestoServer startup
  • Linked the Velox text reader registration library in CMakeLists
presto-native-execution/presto_cpp/main/common/Configs.h
presto-native-execution/presto_cpp/main/common/Configs.cpp
presto-native-execution/presto_cpp/main/PrestoServer.cpp
presto-native-execution/presto_cpp/main/CMakeLists.txt
Extend and adapt native tests to validate TEXTFILE support
  • Added end-to-end TEXTFILE format test in AbstractTestNativeGeneralQueries
  • Enhanced TPCDS setup to include TEXTFILE tables in AbstractTestNativeTpcdsQueries
  • Created new test classes for TPCDS (Thrift) and TPCH (JSON) with TEXTFILE
  • Updated Spark test stub to defer enabling textfile reader
  • Removed legacy unsupported-textfile tests
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeGeneralQueries.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeTpcdsQueries.java
presto-native-execution/src/test/java/com/facebook/presto/spark/TestPrestoSparkNativeGeneralQueries.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeTpcdsQueriesTextfileUsingThrift.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeTpchQueriesTextfileUsingJSON.java
Update Maven configurations to run TEXTFILE tests
  • Removed textfile_reader exclusion from surefire plugin settings
  • Adjusted excluded test groups to reflect new TEXTFILE grouping
presto-native-execution/pom.xml
presto-native-tests/pom.xml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@xin-zhang2 xin-zhang2 marked this pull request as ready for review September 11, 2025 21:16
@xin-zhang2 xin-zhang2 requested review from a team as code owners September 11, 2025 21:16
@prestodb-ci prestodb-ci requested review from a team, BryanCutler and infvg and removed request for a team September 11, 2025 21:16
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider splitting the combined text-reader-writer-enabled flag into two separate flags (text-reader-enabled and text-writer-enabled) so you can roll out reader and writer independently and keep the semantics clear.
  • Since this PR relies on specific Velox fixes, please update the Velox submodule reference or add a version guard in your CMake configuration to ensure the required patches are pulled in before building.
  • You’ve added several near-duplicate test classes for TEXTFILE in TPCDS/TPCH—consider using a TestNG data provider or shared helper to parameterize storage formats and reduce boilerplate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider splitting the combined `text-reader-writer-enabled` flag into two separate flags (`text-reader-enabled` and `text-writer-enabled`) so you can roll out reader and writer independently and keep the semantics clear.
- Since this PR relies on specific Velox fixes, please update the Velox submodule reference or add a version guard in your CMake configuration to ensure the required patches are pulled in before building.
- You’ve added several near-duplicate test classes for TEXTFILE in TPCDS/TPCH—consider using a TestNG data provider or shared helper to parameterize storage formats and reduce boilerplate.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@xin-zhang2 xin-zhang2 force-pushed the textReader branch 2 times, most recently from 9dc3789 to 95a7b8e Compare October 16, 2025 10:00
@xin-zhang2 xin-zhang2 changed the title [native] Add TextReader feat(native): Add TextReader Oct 16, 2025
@steveburnett
Copy link
Contributor

Should this be mentioned in the Presto doc? Perhaps in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/features.rst.

@xin-zhang2
Copy link
Contributor Author

@steveburnett Thanks for the review.
It seems that presto_cpp/features.rst is intended for features exclusive to Presto C++. Since TextReader is already supported in Presto Java and this PR aims to add it in Presto C++, it might not be necessary to list it in that document.

@xin-zhang2
Copy link
Contributor Author

@aditi-pandit @kewang1024 @zacw7
Could you help take a review when you have time? Thanks!

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xin-zhang2. Have one important comment about the config change, else the rest are minor changes.

static constexpr std::string_view kTextWriterEnabled{"text-writer-enabled"};
// Add to temporarily help with gradual rollout for text reader and writer
// TODO: remove once text reader and writer are fully rolled out
static constexpr std::string_view kTextReaderWriterEnabled{"text-reader-writer-enabled"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xin-zhang2 : Would expect that Meta sets this value in clusters.. So we don't rename configs like that as its not backward compatible. Lets leave the config the same, but just document that its for both reader and writer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Thanks for the review. I‘ve changed the code to leave the config name text-writer-enabled for backward compatibility, and use kTextReaderWriterEnabled for the config variable name as it matches the code logic better. Does this sound good to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitted it into two separate properties for semantic clarity, finer-grained control and backward compatibility.

throws Exception
{
return PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder()
.setStorageFormat("TEXTFILE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set this.storageFormat and use setStorageFormat in the 2 different QueryRunners ? Lets make them consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both QueryRunners use setStorageFormat to set the hive.storage-format config.
Setting this.storageFormat is to override the default storageFormat value in AbstractTestNativeTpcdsQueries, which uses this field to construct appropriate CREATE TABLE statements and handle tests differently based on storage formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move these tests to presto-native-tests module to avoid inflating presto-native-execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tpch and tpcds tests are currently in presto-native-execution module. Are we planning to move them to presto-native-tests? If so, I think it might be better to do that in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xin-zhang2 Tpch and tpcds tests with Parquet are a smoke test for native engine, so they will continue to be a part of presto-native-execution. These tests are mostly for text reading abilities. And that is not a core feature as such. So hence was the recommendation to move to presto-native-tests which is the fuil suite of presto native tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the tpch and tpchds query tests for TextReader to presto-native-tests.

@steveburnett
Copy link
Contributor

@steveburnett Thanks for the review. It seems that presto_cpp/features.rst is intended for features exclusive to Presto C++. Since TextReader is already supported in Presto Java and this PR aims to add it in Presto C++, it might not be necessary to list it in that document.

That makes sense, thanks! Please add documentation for this config property to the Presto Java config property page. https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst

@xin-zhang2
Copy link
Contributor Author

@steveburnett Thanks. text-writer-enabled is a Presto C++ config so it should not be added to the Presto Java config property page. Since it is a temporary config and is going to be removed in the future, I think it might be better to leave it undocumented. @aditi-pandit what do you think?

@steveburnett
Copy link
Contributor

@steveburnett Thanks. text-writer-enabled is a Presto C++ config so it should not be added to the Presto Java config property page. Since it is a temporary config and is going to be removed in the future, I think it might be better to leave it undocumented. @aditi-pandit what do you think?

Documentation should describe the current state of the software. A PR adding a config property to Presto should include documentation for the new property, as described in

All new language features, new functions, session and config properties, and major features have documentation added.

(from Designing Your Code in CONTRIBUTING.md. )

If the new config property is a Presto C++ config, then it should be documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/features.rst.

If and when the property is removed from Presto in the future, the documentation can be deleted at that time.

@xin-zhang2 xin-zhang2 requested a review from elharo as a code owner October 30, 2025 13:39
@xin-zhang2
Copy link
Contributor Author

@steveburnett
Thanks for the explanation. This config property is for Presto C++, but it's not a C++ specific feature, so I think it would be better documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst.
I've updated the PR. Could you please help take another look?

steveburnett
steveburnett previously approved these changes Oct 30, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good.

Thanks for the doc!

Copy link
Member

@zacw7 zacw7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rename the title something like Add TextReader enablement config or Add TextReader registration to be more specific?

@xin-zhang2 xin-zhang2 changed the title feat(native): Add TextReader feat(native): Add TextReader registration Oct 30, 2025
@xin-zhang2
Copy link
Contributor Author

@zacw7 Thanks for the review. I've renamed the title to Add TextReader registration.

@zacw7
Copy link
Member

zacw7 commented Oct 30, 2025

@zacw7 Thanks for the review. I've renamed the title to Add TextReader registration.

Thanks for the retitle. What are the test failures? seem to be unrelated.

@xin-zhang2
Copy link
Contributor Author

@zacw7 Yes, the failures were unrelated. I re-ran the jobs and they are clean now.

@xin-zhang2 xin-zhang2 requested a review from zacw7 November 3, 2025 22:35
"web_sales", "web_site"};
Map<String, Long> deletedRowsMap = new HashMap<>();

public void setStorageFormat(String storageFormat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think a public method is good for this. storageFormat should be protected scope and just assigned in test setup, similar to TestPrestoNativeTpcdsQueriesOrcUsingThrift or in the test class constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I've changed storageFormat to protected scope and removed the setter.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xin-zhang2

@xin-zhang2
Copy link
Contributor Author

@tdcmeehan It seems this PR requires a review from presto/committers. Can you please help take a look? Thanks!

Copy link
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@libianoss libianoss requested a review from tdcmeehan November 18, 2025 17:37
@tdcmeehan tdcmeehan self-assigned this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants