feat: Add SQL Support for MERGE INTO in Presto (engine) #26278

acarpente-denodo · 2025-10-10T11:33:23Z

Description

This PR (part 1 of 2): Includes changes related to the Presto engine.
Next PR (part 2 of 2): Includes iceberg connector changes: feat: Add SQL Support for MERGE INTO in Presto (engine + iceberg connector) #25470

Engine support for SQL MERGE command. This command inserts or updates rows in a table based on specified conditions.

Syntax:

MERGE INTO target_table [ [ AS ]  target_alias ]
USING { source_table | query } [ [ AS ] source_alias ]
ON search_condition
WHEN MATCHED THEN
    UPDATE SET ( column = expression [, ...] )
WHEN NOT MATCHED THEN
    INSERT [ column_list ]
    VALUES (expression, ...)

Example: MERGE INTO usage to update the sales information for existing products and insert the sales information for the new products in the market.

MERGE INTO product_sales AS s
    USING monthly_sales AS ms
    ON s.product_id = ms.product_id
WHEN MATCHED THEN
    UPDATE SET
        sales = sales + ms.sales
      , last_sale = ms.sale_date
      , current_price = ms.price
WHEN NOT MATCHED THEN
    INSERT (product_id, sales, last_sale, current_price)
    VALUES (ms.product_id, ms.sales, ms.sale_date, ms.price)

The Presto engine commit introduces an enum called RowChangeParadigm, which describes how a connector modifies rows. The iceberg connector will utilize the DELETE_ROW_AND_INSERT_ROW paradigm, as it represents an updated row as a combination of a deleted row followed by an inserted row. The CHANGE_ONLY_UPDATED_COLUMNS paradigm is meant for connectors that support updating individual columns of rows.

Note: Changes were made after reviewing the following Trino PR: trinodb/trino#7126
So, this commit is deeply inspired by Trino's implementation.

Motivation and Context

The MERGE INTO statement is commonly used to integrate data from two tables with different contents but similar structures.
For example, the source table could be part of a production transactional system, while the target table might be located in a data warehouse for analytics.
Regularly, MERGE operations are performed to update the analytics warehouse with the latest production data.
You can also use MERGE with tables that have different structures, as long as you can define a condition to match the rows between them.

Test Plan

Automated tests developed in TestSqlParser, TestSqlParserErrorHandling, TestStatementBuilder, AbstractAnalyzerTest, TestAnalyzer, and TestClassLoaderSafeWrappers classes.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Add support for the MERGE command in the Presto engine.

sourcery-ai

Sorry @acarpente-denodo, your pull request is larger than the review limit of 150000 diff characters

steveburnett · 2025-10-14T13:39:21Z

Please include documentation for SQL MERGE INTO as a new file in
https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/sql
and add the new file to the index page
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/sql.rst

acarpente-denodo · 2025-10-14T14:03:43Z

Hi @steveburnett. I appreciate your feedback. I added a new commit that includes the documentation for the MERGE INTO command.

...o-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorNodePartitioningProvider.java

presto-common/src/main/java/com/facebook/presto/common/block/Block.java

presto-common/src/main/java/com/facebook/presto/common/Page.java

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

tdcmeehan · 2025-10-21T16:26:59Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

+                return (Table) relation;
+            }
+            checkArgument(relation instanceof AliasedRelation, "relation is neither a Table nor an AliasedRelation");
+            return (Table) ((AliasedRelation) relation).getRelation();


Is it safe to presume that this will always be a table?

Yes, the relation passed as a method parameter should be a Table or an AliasedRelation whose relation is a Table.

presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java

presto-main-base/src/main/java/com/facebook/presto/sql/planner/MergePartitioningHandle.java

presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java

tdcmeehan · 2025-10-21T17:00:10Z

presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java

+                    joinSubPlan = subqueryPlanner.handleSubqueries(joinSubPlan, setExpression, mergeStmt, sqlPlannerContext);
+                    expression = joinSubPlan.rewrite(setExpression);
+                    expression = coerceIfNecessary(analysis, setExpression, expression);
+                    expression = checkNotNullColumns(targetColumnHandle, expression, fieldNumber, mergeAnalysis);


I don't think this should be done at the plan level, as it will be very expensive. Instead, we should send to the MergeWriterOperator what columns are non-null, and the operator can do a more efficient bulk check. Let me know if I misunderstood anything.

The goal of this line is to prevent the MERGE INTO command from inserting or updating a NULL value in a non-null column. If the user attempts to execute a command that would do that, then stop the query execution as soon as possible.

The main drawback of your approach is that you have to propagate column nullability information down to MergeWriterOperator, which is not optimal. The propagation requires going from the TableScanOperator -> AssignUniqueIdOperator -> FilterAndProjectOperator -> LookupJoinOperator -> FilterAndProjectOperator -> MarkDistinctOperator -> FilterAndProjectOperator -> MergeProcessorOperator -> MergeWriterOperator.

The benefit of performing this verification during planning is that the Presto engine can cancel the query execution as soon as possible. In your approach, Presto ends up doing all the work before realizing in the last step that it was unnecessary because of an unsatisfied column constraint.

steveburnett

Thanks for the doc! One comment to ask if you intended the formatting to turn out like it did.

presto-docs/src/main/sphinx/sql/merge.rst

steveburnett

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

tdcmeehan

Overall, this LGTM. Thanks! Let me know when it's ready for a final round of review.

presto-spi/src/main/java/com/facebook/presto/spi/connector/RowChangeParadigm.java

sourcery-ai

Sorry @acarpente-denodo, your pull request is larger than the review limit of 150000 diff characters

Cherry-pick of trinodb/trino@cee96c3 Co-authored-by: David Stryker <[email protected]>

Automated tests. Cherry-pick of trinodb/trino@cee96c3 Co-authored-by: David Stryker <[email protected]>

Added MERGE INTO statement documentation.

steveburnett

LGTM! (docs)

Pull updated branch, new local doc build. Looks good. Thanks!

tdcmeehan

This is very well designed. Thank you. I only have nits.

tdcmeehan · 2025-11-21T02:19:15Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

+                // Add the target table row id field used to process the MERGE command.
+                ColumnHandle targetTableRowIdColumnHandle = metadata.getMergeTargetTableRowIdColumnHandle(session, tableHandle.get());
+                Type targetTableRowIdType = metadata.getColumnMetadata(session, tableHandle.get(), targetTableRowIdColumnHandle).getType();
+                Field targetTableRowIdField = Field.newUnqualified(Optional.empty(), "$target_table_row_id", targetTableRowIdType);


Suggested change

Field targetTableRowIdField = Field.newUnqualified(Optional.empty(), "$target_table_row_id", targetTableRowIdType);

Field targetTableRowIdField = Field.newUnqualified(table.getLocation(), "$target_table_row_id", targetTableRowIdType);

tdcmeehan · 2025-11-21T02:34:10Z

presto-common/src/main/java/com/facebook/presto/common/block/BlockUtil.java

        return blocks;
    }
+
+    static boolean[] copyIsNullAndAppendNull(@Nullable boolean[] isNull, int offsetBase, int positionCount)


Please add a unit test

tdcmeehan · 2025-11-21T02:34:23Z

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

        return new DictionaryBlock(idsOffset, getPositionCount(), loadedDictionary, ids, false, randomDictionaryId());
    }

+    public Block createProjection(Block newDictionary)


Please add a unit test in TestDictionaryBlock

tdcmeehan · 2025-11-21T02:34:42Z

presto-common/src/main/java/com/facebook/presto/common/block/RowBlock.java

+     * DictionaryBlock, but the underlying block must be a RowBlock. The returned field blocks will be the same
+     * length as the specified block, which means they are not null suppressed.
+     */
+    public static List<Block> getRowFieldsFromBlock(Block block)


Please add a unit test in TestRowBlock

tdcmeehan · 2025-11-21T02:35:46Z

presto-docs/src/main/sphinx/connector/iceberg.rst


    Query 20250204_010445_00022_ymwi5 failed: Iceberg table updates require at least format version 2 and update mode must be merge-on-read

+Iceberg tables do not support running multiple ``MERGE`` statements on the same table in parallel. If two or more ``MERGE`` operations are executed concurrently on the same Iceberg table:


Nit: would be good to link to the merge doc from here.

tdcmeehan · 2025-11-21T02:37:37Z

presto-docs/src/main/sphinx/sql/merge.rst

+Any connector can be used as a source table for a ``MERGE`` statement.
+Only connectors which support the ``MERGE`` statement can be the target of a merge operation.
+See the :doc:`connector documentation </connector/>` for more information.
+The ``MERGE`` statement is currently supported only by the iceberg connector.


Suggested change

The ``MERGE`` statement is currently supported only by the iceberg connector.

The ``MERGE`` statement is currently supported only by the Iceberg connector.

tdcmeehan · 2025-11-21T02:39:57Z

...o-main-base/src/main/java/com/facebook/presto/server/thrift/MergeTableHandleThriftCodec.java

+            throws Exception
+    {
+        ByteBuffer byteBuffer = reader.readBinary();
+        assert (byteBuffer.position() == 0);


We don't use Java asserts because those can be disabled. We use checkArgument or checkState from Guava instead.

tdcmeehan · 2025-11-21T02:41:14Z

...main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java

+        //
+        // Merge
+        //
+


Suggested change

//

// Merge

//

tdcmeehan · 2025-11-21T02:42:01Z

presto-main-base/src/main/java/com/facebook/presto/util/GraphvizPrinter.java

        ANALYZE_FINISH,
        EXPLAIN_ANALYZE,
        UPDATE,
+        MERGE


Suggested change

MERGE

MERGE,

sourcery-ai bot reviewed Oct 10, 2025

View reviewed changes

acarpente-denodo changed the title ~~Draft: feat: Add SQL Support for MERGE INTO in Presto (engine)~~ feat: Add SQL Support for MERGE INTO in Presto (engine) Oct 10, 2025

acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO_(engine) branch from b8f9b3e to b27ac47 Compare October 13, 2025 12:21

tdcmeehan self-assigned this Oct 13, 2025

acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO_(engine) branch 2 times, most recently from 71aebc7 to 6f60239 Compare October 14, 2025 08:00

acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO_(engine) branch from 6f60239 to 7a54a60 Compare October 14, 2025 14:00

acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO_(engine) branch 2 times, most recently from 8a860c1 to 61ea3bd Compare October 16, 2025 08:14

acarpente-denodo mentioned this pull request Oct 21, 2025

feat: Add SQL Support for MERGE INTO in Presto (engine) [sourcery-ai] #26376

Closed

6 tasks