-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for Iceberg support in PrestoCPP #24741
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Some nits and suggestions, nothing major.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agrawalreetika! I see PrestoCPP and Presto C++ were mixed. Let's unify them to Presto C++
are ``HIVE``, ``HADOOP``, and ``NESSIE`` and ``REST``. | ||
|
||
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. | ||
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto C++ Support value is missing
as a join with the data of the equality delete files. | ||
|
||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` | ||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto C++ Support value is missing
statistics file cache. | ||
======================================================= ============================================================= ============ | ||
======================================================= ============================================================= ================================== =================== ========================================= | ||
|
||
Table Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a section for the C++ support for Table Properties? Just saying it's not suported because write is not implemented yet
@@ -1511,13 +1587,23 @@ schema evolution, such as adding, dropping, and renaming columns. With schema | |||
evolution, users can evolve a table schema with SQL after enabling the Presto | |||
Iceberg connector. | |||
|
|||
Presto C++ Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Partition Column Transform. Could you please add Presto C++ support as well? Thanks!
|
||
* Supports reading and writing of DWRF and PARQUET file formats, supports reading ORC file format. | ||
Hive Connector - Iceberg Table Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following structure would be clearer
Supported Connectors
- Hive connector
- Hive table support
- Iceberg table support
- TPCH connector
- ....
- Fuzzer connector
aa5b2d3
to
76c9f55
Compare
@steveburnett @yingsu00 Thanks for your review. I have made the changes based on your comments. Please have a look at your convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc, it's very helpful. Only a couple of nit and little thing.
|
||
``SELECT`` Yes Yes Read is supported in Presto C++ including those with positional delete files. | ||
|
||
``INSERT INTO`` Yes No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duplicate with line 1155.
|
||
* Only read operations are supported for Iceberg tables. | ||
|
||
* The Iceberg connector supports both V1 and V2 tables, including those with positional delete files, but does not support equality delete files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a reader's perspective, I'm a little confused about the Iceberg connector
here, do you think it's more appropriate to use something like Hive connector for Iceberg
or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to create an iceberg catalog in Prestissimo as well for accessing Iceberg tables but the underline implementation on Velox is via Hive connector itself which is why I thought of keeping Hive Connector
as main header and then different table support.
I am open for suggestions here whatever we all think is more clear.
@yingsu00 What's your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't describe it clearly. What I mean is, the content structure Hive connector/Iceberg Table Support
is no problem, just the words Iceberg connector
here may bring a little confusion. If I were a newer, I might wonder whether there is an Iceberg connector
for presto C++ that I haven't noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika : If we use Java side Iceberg connector or catalog code, then its better to call it Iceberg Connector. The reuse of HiveConnector is an internal detail imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could call it Iceberg connector because it's in Presto C++ point of view, not Velox point of view. But it would be helpful to explain that the Presto C++ Iceberg connector and catalog is backed up by Velox HiveConnector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hantangwangd Thanks for the clarification. So here as @aditi-pandit & @yingsu00 mentioned from Presto C++ point of view it's going to be a different iceberg connector with something like this -
connector.name=iceberg
That is why I thought calling it iceberg connector would be better in the Presto C++ document.
@hantangwangd @aditi-pandit @yingsu00 Please take a look at the document, its already updated based on the points we discussed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the point, thank you all for your perspective @aditi-pandit @yingsu00 @agrawalreetika.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I pulled the updated branch, ran a new local doc build, and reviewed again.
I found a few nits that aren't directly related to your new content here that I hope we can fix here, I don't think that fixing them will add a lot of work or delay the PR.
76c9f55
to
ae78e53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thank you!
Description
Add documentation for Iceberg support in PrestoCPP
Motivation and Context
Add documentation for Iceberg support in PrestoCPP
Impact
Iceberg documentation improvement
Test Plan
Contributor checklist
Release Notes