Skip to content

Commit ae78e53

Browse files
Add documentation for Iceberg support in PrestoCPP
1 parent fa53b84 commit ae78e53

File tree

3 files changed

+156
-36
lines changed

3 files changed

+156
-36
lines changed

presto-docs/src/main/sphinx/connector/iceberg.rst

Lines changed: 132 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,11 @@ the data and delete files of Iceberg tables are stored in S3. An example configu
308308
hive.s3.endpoint=http://192.168.0.103:9878
309309
hive.s3.path-style-access=true
310310

311+
Presto C++ Support
312+
^^^^^^^^^^^^^^^^^^
313+
314+
``HIVE``, ``NESSIE``, ``REST``, and ``HADOOP`` Iceberg catalogs are supported in Presto C++.
315+
311316
Configuration Properties
312317
------------------------
313318

@@ -319,69 +324,69 @@ Configuration Properties
319324

320325
The following configuration properties are available for all catalog types:
321326

322-
======================================================= ============================================================= ============
323-
Property Name Description Default
324-
======================================================= ============================================================= ============
325-
``iceberg.catalog.type`` The catalog type for Iceberg tables. The available values ``HIVE``
327+
======================================================= ============================================================= ================================== =================== =========================================
328+
Property Name Description Default Presto Java Support Presto C++ Support
329+
======================================================= ============================================================= ================================== =================== =========================================
330+
``iceberg.catalog.type`` The catalog type for Iceberg tables. The available values ``HIVE`` Yes Yes, only needed on coordinator
326331
are ``HIVE``, ``HADOOP``, and ``NESSIE`` and ``REST``.
327332

328-
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources.
333+
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. Yes Yes, only needed on coordinator
329334

330335
Example: ``/etc/hadoop/conf/core-site.xml.`` This property
331336
is required if the iceberg.catalog.type is ``hadoop``.
332337
Otherwise, it will be ignored.
333338

334-
``iceberg.file-format`` The storage file format for Iceberg tables. The available ``PARQUET``
339+
``iceberg.file-format`` The storage file format for Iceberg tables. The available ``PARQUET`` Yes NA
335340
values are ``PARQUET`` and ``ORC``.
336341

337-
``iceberg.compression-codec`` The compression codec to use when writing files. The ``GZIP``
342+
``iceberg.compression-codec`` The compression codec to use when writing files. The ``GZIP`` Yes NA
338343
available values are ``NONE``, ``SNAPPY``, ``GZIP``,
339344
``LZ4``, and ``ZSTD``.
340345

341-
``iceberg.max-partitions-per-writer`` The Maximum number of partitions handled per writer. ``100``
346+
``iceberg.max-partitions-per-writer`` The maximum number of partitions handled per writer. ``100`` Yes NA
342347

343-
``iceberg.minimum-assigned-split-weight`` A decimal value in the range (0, 1] is used as a minimum ``0.05``
348+
``iceberg.minimum-assigned-split-weight`` A decimal value in the range (0, 1] is used as a minimum ``0.05`` Yes Yes
344349
for weights assigned to each split. A low value may improve
345350
performance on tables with small files. A higher value may
346351
improve performance for queries with highly skewed
347352
aggregations or joins.
348353

349-
``iceberg.enable-merge-on-read-mode`` Enable reading base tables that use merge-on-read for ``true``
354+
``iceberg.enable-merge-on-read-mode`` Enable reading base tables that use merge-on-read for ``true`` Yes Yes, only needed on coordinator
350355
updates.
351356

352-
``iceberg.delete-as-join-rewrite-enabled`` When enabled, equality delete row filtering is applied ``true``
357+
``iceberg.delete-as-join-rewrite-enabled`` When enabled, equality delete row filtering is applied ``true`` Yes Yes
353358
as a join with the data of the equality delete files.
354359

355-
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true``
360+
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes NA
356361

357-
``iceberg.statistic-snapshot-record-difference-weight`` The amount that the difference in total record count matters
362+
``iceberg.statistic-snapshot-record-difference-weight`` The amount that the difference in total record count matters Yes Yes, only needed on coordinator
358363
when calculating the closest snapshot when picking
359364
statistics. A value of 1 means a single record is equivalent
360365
to 1 millisecond of time difference.
361366

362-
``iceberg.pushdown-filter-enabled`` Experimental: Enable filter pushdown for Iceberg. This is ``false``
363-
only supported with Native Worker.
367+
``iceberg.pushdown-filter-enabled`` Experimental: Enable filter pushdown for Iceberg. This is ``false`` No Yes
368+
only supported with Presto C++.
364369

365-
``iceberg.rows-for-metadata-optimization-threshold`` The maximum number of partitions in an Iceberg table to ``1000``
370+
``iceberg.rows-for-metadata-optimization-threshold`` The maximum number of partitions in an Iceberg table to ``1000`` Yes Yes
366371
allow optimizing queries of that table using metadata. If
367372
an Iceberg table has more partitions than this threshold,
368373
metadata optimization is skipped.
369374

370375
Set to ``0`` to disable metadata optimization.
371376

372-
``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors``
377+
``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors`` Yes Yes, only needed on coordinator
373378

374-
``iceberg.metadata-previous-versions-max`` The max number of old metadata files to keep in current ``100``
375-
metadata log.
379+
``iceberg.metadata-previous-versions-max`` The maximum number of old metadata files to keep in ``100`` Yes NA
380+
current metadata log.
376381

377-
``iceberg.metadata-delete-after-commit`` Set to ``true`` to delete the oldest metadata files after ``false``
382+
``iceberg.metadata-delete-after-commit`` Set to ``true`` to delete the oldest metadata files after ``false`` Yes NA
378383
each commit.
379384

380-
``iceberg.metrics-max-inferred-column`` The maximum number of columns for which metrics ``100``
385+
``iceberg.metrics-max-inferred-column`` The maximum number of columns for which metrics ``100`` Yes NA
381386
are collected.
382-
``iceberg.max-statistics-file-cache-size`` Maximum size in bytes that should be consumed by the ``256MB``
387+
``iceberg.max-statistics-file-cache-size`` Maximum size in bytes that should be consumed by the ``256MB`` Yes Yes, only needed on coordinator
383388
statistics file cache.
384-
======================================================= ============================================================= ============
389+
======================================================= ============================================================= ================================== =================== =========================================
385390

386391
Table Properties
387392
------------------------
@@ -482,34 +487,40 @@ Deprecated Property Name New Property Name
482487
``metrics_max_inferred_column`` ``write.metadata.metrics.max-inferred-column-defaults``
483488
======================================= ===============================================================
484489

490+
Presto C++ Support
491+
^^^^^^^^^^^^^^^^^^
492+
493+
Table properties are not supported in Presto C++ because write operations have not been implemented.
485494

486495
Session Properties
487496
------------------
488497

489498
Session properties set behavior changes for queries executed within the given session.
490499

491-
===================================================== ======================================================================
492-
Property Name Description
493-
===================================================== ======================================================================
494-
``iceberg.delete_as_join_rewrite_enabled`` Overrides the behavior of the connector property
500+
===================================================== ======================================================================= =================== ==================
501+
Property Name Description Presto Java Support Presto C++ Support
502+
===================================================== ======================================================================= =================== ==================
503+
``iceberg.delete_as_join_rewrite_enabled`` Overrides the behavior of the connector property Yes Yes
495504
``iceberg.delete-as-join-rewrite-enabled`` in the current session.
496-
``iceberg.hive_statistics_merge_strategy`` Overrides the behavior of the connector property
505+
``iceberg.hive_statistics_merge_strategy`` Overrides the behavior of the connector property Yes Yes
497506
``iceberg.hive-statistics-merge-strategy`` in the current session.
498-
``iceberg.rows_for_metadata_optimization_threshold`` Overrides the behavior of the connector property
507+
``iceberg.rows_for_metadata_optimization_threshold`` Overrides the behavior of the connector property Yes Yes
499508
``iceberg.rows-for-metadata-optimization-threshold`` in the current
500509
session.
501-
``iceberg.target_split_size_bytes`` Overrides the target split size for all tables in a query in bytes.
510+
``iceberg.target_split_size_bytes`` Overrides the target split size for all tables in a query in bytes. Yes Yes
502511
Set to 0 to use the value in each Iceberg table's
503512
``read.split.target-size`` property.
504-
``iceberg.affinity_scheduling_file_section_size`` When the ``node_selection_strategy`` or
513+
``iceberg.affinity_scheduling_file_section_size`` When the ``node_selection_strategy`` or Yes Yes
505514
``hive.node-selection-strategy`` property is set to ``SOFT_AFFINITY``,
506515
this configuration property will change the size of a file chunk that
507516
is hashed to a particular node when determining the which worker to
508517
assign a split to. Splits which read data from the same file within
509518
the same chunk will hash to the same node. A smaller chunk size will
510519
result in a higher probability splits being distributed evenly across
511520
the cluster, but reduce locality.
512-
===================================================== ======================================================================
521+
``iceberg.parquet_dereference_pushdown_enabled`` Overrides the behavior of the connector property Yes No
522+
``iceberg.enable-parquet-dereference-pushdown`` in the current session.
523+
===================================================== ======================================================================= =================== ==================
513524

514525
Caching Support
515526
---------------
@@ -542,6 +553,11 @@ Property Name Description
542553
this size will not be cached.
543554
==================================================== ============================================================= ============
544555

556+
Presto C++ Support
557+
~~~~~~~~~~~~~~~~~~
558+
559+
Manifest file caching is supported in Presto C++.
560+
545561
Alluxio Data Cache
546562
^^^^^^^^^^^^^^^^^^
547563

@@ -565,6 +581,11 @@ JMX queries to get the metrics and verify the cache usage::
565581

566582
SHOW TABLES FROM jmx.current like '%alluxio%';
567583

584+
Presto C++ Support
585+
~~~~~~~~~~~~~~~~~~
586+
587+
Alluxio data caching is applicable for Presto Java. Presto C++ supports Async data cache. See :ref:`async_data_caching_and_prefetching`.
588+
568589
File And Stripe Footer Cache
569590
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
570591

@@ -596,6 +617,11 @@ JMX queries to get the metrics and verify the cache usage::
596617

597618
SELECT * FROM jmx.current."com.facebook.presto.hive:name=iceberg_parquetmetadata,type=cachestatsmbean";
598619

620+
Presto C++ Support
621+
~~~~~~~~~~~~~~~~~~
622+
623+
File and stripe footer cache is not applicable for Presto C++.
624+
599625
Metastore Cache
600626
^^^^^^^^^^^^^^^
601627

@@ -635,6 +661,11 @@ as part of a SQL query by including them in your SELECT statement.
635661
----------------------------------+------------
636662
2 | 3
637663

664+
Presto C++ Support
665+
^^^^^^^^^^^^^^^^^^
666+
667+
All above extra hidden metadata columns are supported in Presto C++.
668+
638669
Extra Hidden Metadata Tables
639670
----------------------------
640671

@@ -815,6 +846,10 @@ example uses the earliest snapshot ID: ``2423571386296047175``
815846
testBranch | BRANCH | 3374797416068698476 | NULL | NULL | NULL
816847
testTag | TAG | 4686954189838128572 | 10 | NULL | NULL
817848

849+
Presto C++ Support
850+
^^^^^^^^^^^^^^^^^^
851+
852+
All above extra hidden metadata tables are supported in Presto C++.
818853

819854
Procedures
820855
----------
@@ -1098,9 +1133,54 @@ Examples:
10981133

10991134
CALL iceberg.system.set_table_property('schema_name', 'table_name', 'commit.retry.num-retries', '10');
11001135

1136+
Presto C++ Support
1137+
^^^^^^^^^^^^^^^^^^
1138+
1139+
All above procedures are supported in Presto C++.
1140+
11011141
SQL Support
11021142
-----------
11031143

1144+
SQL Support Summary for Presto Java and Presto C++:
1145+
1146+
============================== ============= ============ ============================================================================
1147+
SQL Operation Presto Java Presto C++ Comments
1148+
============================== ============= ============ ============================================================================
1149+
``CREATE SCHEMA`` Yes Yes
1150+
1151+
``CREATE TABLE`` Yes Yes
1152+
1153+
``CREATE VIEW`` Yes Yes
1154+
1155+
``INSERT INTO`` Yes No
1156+
1157+
``CREATE TABLE AS SELECT`` Yes No
1158+
1159+
``SELECT`` Yes Yes Read is supported in Presto C++ including those with positional delete files.
1160+
1161+
``ALTER TABLE`` Yes Yes
1162+
1163+
``ALTER VIEW`` Yes Yes
1164+
1165+
``TRUNCATE`` Yes Yes
1166+
1167+
``DELETE`` Yes No
1168+
1169+
``DROP TABLE`` Yes Yes
1170+
1171+
``DROP VIEW`` Yes Yes
1172+
1173+
``DROP SCHEMA`` Yes Yes
1174+
1175+
``SHOW CREATE TABLE`` Yes Yes
1176+
1177+
``SHOW COLUMNS`` Yes Yes
1178+
1179+
``DESCRIBE`` Yes Yes
1180+
1181+
``UPDATE`` Yes No
1182+
============================== ============= ============ ============================================================================
1183+
11041184
The Iceberg connector supports querying and manipulating Iceberg tables and schemas
11051185
(databases). Here are some examples of the SQL operations supported by Presto:
11061186

@@ -1237,6 +1317,11 @@ Transform Name Source Types
12371317
``Hour`` ``timestamp``
12381318
===================== =======================================================================
12391319

1320+
Presto C++ Support
1321+
~~~~~~~~~~~~~~~~~~
1322+
1323+
Read from the tables with Partition column transform is supported in Presto C++.
1324+
12401325
CREATE VIEW
12411326
^^^^^^^^^^^
12421327

@@ -1558,13 +1643,23 @@ schema evolution, such as adding, dropping, and renaming columns. With schema
15581643
evolution, users can evolve a table schema with SQL after enabling the Presto
15591644
Iceberg connector.
15601645

1646+
Presto C++ Support
1647+
^^^^^^^^^^^^^^^^^^
1648+
1649+
Schema Evolution is supported in Presto C++.
1650+
15611651
Parquet Writer Version
15621652
----------------------
15631653

15641654
Presto now supports Parquet writer versions V1 and V2 for the Iceberg catalog.
15651655
It can be toggled using the session property ``parquet_writer_version`` and the config property ``hive.parquet.writer.version``.
15661656
Valid values for these properties are ``PARQUET_1_0`` and ``PARQUET_2_0``. Default is ``PARQUET_1_0``.
15671657

1658+
Presto C++ Support
1659+
^^^^^^^^^^^^^^^^^^
1660+
1661+
Presto C++ supports Parquet writer versions V1.
1662+
15681663
Example Queries
15691664
^^^^^^^^^^^^^^^
15701665

@@ -1871,6 +1966,11 @@ Query Iceberg table by specifying the tag name:
18711966
20 | canada | 2 | comment
18721967
(3 rows)
18731968

1969+
Presto C++ Support
1970+
^^^^^^^^^^^^^^^^^^
1971+
1972+
Time Travel is supported in Presto C++.
1973+
18741974
Type mapping
18751975
------------
18761976

0 commit comments

Comments
 (0)