Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Iceberg support in PrestoCPP #24741

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

Description

Add documentation for Iceberg support in PrestoCPP

Motivation and Context

Add documentation for Iceberg support in PrestoCPP

Impact

Iceberg documentation improvement

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@agrawalreetika agrawalreetika requested a review from yingsu00 March 17, 2025 18:46
@agrawalreetika agrawalreetika self-assigned this Mar 17, 2025
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 17, 2025
@prestodb-ci prestodb-ci requested review from a team, nmahadevuni and auden-woolfson and removed request for a team March 17, 2025 18:46
@github-actions github-actions bot added the docs label Mar 17, 2025
@prestodb-ci prestodb-ci requested a review from a team March 17, 2025 18:49
auden-woolfson
auden-woolfson previously approved these changes Mar 17, 2025
Copy link
Contributor

@auden-woolfson auden-woolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Some nits and suggestions, nothing major.

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agrawalreetika! I see PrestoCPP and Presto C++ were mixed. Let's unify them to Presto C++

are ``HIVE``, ``HADOOP``, and ``NESSIE`` and ``REST``.

``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources.
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. Yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto C++ Support value is missing

as a join with the data of the equality delete files.

``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true``
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto C++ Support value is missing

statistics file cache.
======================================================= ============================================================= ============
======================================================= ============================================================= ================================== =================== =========================================

Table Properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add a section for the C++ support for Table Properties? Just saying it's not suported because write is not implemented yet

@@ -1511,13 +1587,23 @@ schema evolution, such as adding, dropping, and renaming columns. With schema
evolution, users can evolve a table schema with SQL after enabling the Presto
Iceberg connector.

Presto C++ Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Partition Column Transform. Could you please add Presto C++ support as well? Thanks!


* Supports reading and writing of DWRF and PARQUET file formats, supports reading ORC file format.
Hive Connector - Iceberg Table Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following structure would be clearer
Supported Connectors

  • Hive connector
    • Hive table support
    • Iceberg table support
  • TPCH connector
    • ....
  • Fuzzer connector

@agrawalreetika
Copy link
Member Author

@steveburnett @yingsu00 Thanks for your review. I have made the changes based on your comments. Please have a look at your convenience.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc, it's very helpful. Only a couple of nit and little thing.


``SELECT`` Yes Yes Read is supported in Presto C++ including those with positional delete files.

``INSERT INTO`` Yes No
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate with line 1155.


* Only read operations are supported for Iceberg tables.

* The Iceberg connector supports both V1 and V2 tables, including those with positional delete files, but does not support equality delete files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a reader's perspective, I'm a little confused about the Iceberg connector here, do you think it's more appropriate to use something like Hive connector for Iceberg or something else?

Copy link
Member Author

@agrawalreetika agrawalreetika Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to create an iceberg catalog in Prestissimo as well for accessing Iceberg tables but the underline implementation on Velox is via Hive connector itself which is why I thought of keeping Hive Connector as main header and then different table support.

I am open for suggestions here whatever we all think is more clear.
@yingsu00 What's your opinion on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't describe it clearly. What I mean is, the content structure Hive connector/Iceberg Table Support is no problem, just the words Iceberg connector here may bring a little confusion. If I were a newer, I might wonder whether there is an Iceberg connector for presto C++ that I haven't noticed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika : If we use Java side Iceberg connector or catalog code, then its better to call it Iceberg Connector. The reuse of HiveConnector is an internal detail imo.

Copy link
Contributor

@yingsu00 yingsu00 Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could call it Iceberg connector because it's in Presto C++ point of view, not Velox point of view. But it would be helpful to explain that the Presto C++ Iceberg connector and catalog is backed up by Velox HiveConnector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd Thanks for the clarification. So here as @aditi-pandit & @yingsu00 mentioned from Presto C++ point of view it's going to be a different iceberg connector with something like this -

connector.name=iceberg

That is why I thought calling it iceberg connector would be better in the Presto C++ document.

@hantangwangd @aditi-pandit @yingsu00 Please take a look at the document, its already updated based on the points we discussed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the point, thank you all for your perspective @aditi-pandit @yingsu00 @agrawalreetika.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I pulled the updated branch, ran a new local doc build, and reviewed again.

I found a few nits that aren't directly related to your new content here that I hope we can fix here, I don't think that fixing them will add a lot of work or delay the PR.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs from:IBM PR from IBM
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants