Skip to content

Conversation

@acarpente-denodo
Copy link
Contributor

@acarpente-denodo acarpente-denodo commented Jul 1, 2025

Description

Engine support for SQL MERGE command. This command inserts or updates rows in a table based on specified conditions.
Iceberg connector support for the SQL MERGE INTO command.

Syntax:

MERGE INTO target_table [ [ AS ]  target_alias ]
USING { source_table | query } [ [ AS ] source_alias ]
ON search_condition
WHEN MATCHED THEN
    UPDATE SET ( column = expression [, ...] )
WHEN NOT MATCHED THEN
    INSERT [ column_list ]
    VALUES (expression, ...)

Example: MERGE INTO usage to update the sales information for existing products and insert the sales information for the new products in the market.

MERGE INTO product_sales AS s
    USING monthly_sales AS ms
    ON s.product_id = ms.product_id
WHEN MATCHED THEN
    UPDATE SET
        sales = sales + ms.sales
      , last_sale = ms.sale_date
      , current_price = ms.price
WHEN NOT MATCHED THEN
    INSERT (product_id, sales, last_sale, current_price)
    VALUES (ms.product_id, ms.sales, ms.sale_date, ms.price)

The Presto engine commit introduces an enum called RowChangeParadigm, which describes how a connector modifies rows. The iceberg connector will utilize the DELETE_ROW_AND_INSERT_ROW paradigm, as it represents an updated row as a combination of a deleted row followed by an inserted row. The CHANGE_ONLY_UPDATED_COLUMNS paradigm is meant for connectors that support updating individual columns of rows.

Note: Changes were made after reviewing the following Trino PR: trinodb/trino#7126
So, this PR is deeply inspired by Trino's implementation.

Motivation and Context

The MERGE INTO statement is commonly used to integrate data from two tables with different contents but similar structures.
For example, the source table could be part of a production transactional system, while the target table might be located in a data warehouse for analytics.
Regularly, MERGE operations are performed to update the analytics warehouse with the latest production data.
You can also use MERGE with tables that have different structures, as long as you can define a condition to match the rows between them.

Test Plan

Automated tests developed in:

  • Engine: TestSqlParser, TestSqlParserErrorHandling, TestStatementBuilder, AbstractAnalyzerTest, TestAnalyzer, and TestClassLoaderSafeWrappers classes.
  • Iceberg connector: IcebergDistributedTestBase

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Add support for the MERGE command in the Presto engine.

Iceberg Connector Changes
* Add support for MERGE command in the Iceberg connector.

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 3 times, most recently from 3e4437e to 5b65ae1 Compare July 4, 2025 07:59
@aditi-pandit
Copy link
Contributor

@acarpente-denodo : Thanks for this contribution.

The code will need extensive changes for Presto C++ support. Have you investigated that ?

Did you try this code with a C++ worker (or side-car enabled ) ? We can do the full C++ support as a follow up, but its important the queries fail with a clean error message on Presto C++

@acarpente-denodo
Copy link
Contributor Author

Hi @aditi-pandit,
Thanks for your feedback. No, I haven't investigated the support for Presto C++ yet. I also haven't tested this code with a C++ worker or with side-car enabled. At this point, I'm still focused on getting the MERGE INTO command fully working on Java. Once that's stable, I will plan for proper C++ support, including ensuring clean error messages when used with Presto C++.

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adrian, were you able to build this PR? I got the following errors:

[ERROR] Failed to execute goal org.basepom.maven:duplicate-finder-maven-plugin:1.5.1:check (default) on project presto-main: Found duplicate classes/resources! -> [Help 1]

Also, how did you get the "handle not found" error? Were you running a query using the query runner? Were you able to get the correct plan? Would you please share the plan and full error stack?

sql/grant
sql/grant-roles
sql/insert
sql/merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this commit, the merge into only supports parsing and not the full functionality. Let's move the documentation to the last commit when it's fully supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's done.

}

@Test
public void testMergeInto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the original name testMerge()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's done.

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adrian, were you able to build this PR? The CI build failed with these:

In file included from /__w/presto/presto/presto-native-execution/./presto_cpp/main/PeriodicHeartbeatManager.h:17:
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:3: error: unknown type name 'com'
  com.facebook.presto.spi.MergeHandle handle = {};
  ^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:6: error: expected member name or ';' after declaration specifiers
  com.facebook.presto.spi.MergeHandle handle = {};
  ~~~^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1686:3: error: unknown type name 'ConnectorMergeTableHandle'; did you mean 'ConnectorDeleteTableHandle'?
  ConnectorMergeTableHandle connectorMergeTableHandle = {};
  ^~~~~~~~~~~~~~~~~~~~~~~~~
  ConnectorDeleteTableHandle
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:[313](https://github.com/prestodb/presto/actions/runs/16068848370/job/45348688297?pr=25470#step:17:314):8: note: 'ConnectorDeleteTableHandle' declared here
struct ConnectorDeleteTableHandle : public JsonEncodedSubclass {};
       ^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1709:3: error: unknown type name 'RowChangeParadigm'
  RowChangeParadigm paradigm = {};
  ^
4 errors generated.

Would you please fix the build errors first?

Also, how did you get the "handle not found" error? Were you running a query using the query runner? Since the build fails I wonder how you could do that. If you could run a query, were you able to get the correct plan? Would you please share the plan and full error stack? Thanks!

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 3 times, most recently from ace4af6 to b02d82f Compare July 14, 2025 14:46
@acarpente-denodo
Copy link
Contributor Author

acarpente-denodo commented Jul 15, 2025

Hi @yingsu00,

The PR is still in development. I'm still focused on getting the MERGE INTO command fully working on Java. The coordinator and workers are currently running on Java. I haven't investigated the MERGE INTO support for Presto C++ yet.

Automated tests have not been created yet. To test this enhancement, you need to perform a manual test. It is described at the end of the PR description in the section called "Test Plan".

I'm facing an error during the serialization of a new class called MergePartitioningHandle, which implements ConnectorPartitioningHandle interface

Caused by: java.lang.IllegalArgumentException: No connector for handle: MERGE [update = iceberg:INSTANCE]
at com.facebook.presto.metadata.HandleResolver.getId(HandleResolver.java:228)
at com.facebook.presto.metadata.HandleResolver.getId(HandleResolver.java:120)
at com.facebook.presto.metadata.AbstractTypedJacksonModule$InternalTypeResolver.idFromValueAndType(AbstractTypedJacksonModule.java:158)
at com.facebook.presto.metadata.AbstractTypedJacksonModule$InternalTypeResolver.idFromValue(AbstractTypedJacksonModule.java:150)
at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.idFromValue(TypeSerializerBase.java:92)
at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase._generateTypeId(TypeSerializerBase.java:77)
at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.writeTypePrefix(TypeSerializerBase.java:45)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeWithType(BeanSerializerBase.java:650)
at com.facebook.presto.metadata.AbstractTypedJacksonModule$InternalTypeSerializer.serialize(AbstractTypedJacksonModule.java:115)
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:732)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:772)
... 100 more

I think there is an issue with com.facebook.presto.iceberg.IcebergUpdateHandle#INSTANCE. You can find a complete description of this problem in the Slack thread I created a few weeks ago: https://prestodb.slack.com/archives/CLYMEEJ6S/p1749466521358289

The goal of this draft PR is to get assistance in fixing this problem.

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch from 3465776 to 9658fd0 Compare July 21, 2025 10:50
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 6 times, most recently from 20d7c91 to 441ed5b Compare August 8, 2025 06:43
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 3 times, most recently from 09c9c54 to bca3929 Compare August 12, 2025 13:09
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format nit. Thanks for the doc!

@acarpente-denodo
Copy link
Contributor Author

Hi @steveburnett! Thank you for reviewing the feature documentation.

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch from 7bb8555 to e799d9f Compare August 13, 2025 09:32
steveburnett
steveburnett previously approved these changes Aug 13, 2025
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 3 times, most recently from fd4bc78 to 8f49697 Compare August 18, 2025 18:22
@acarpente-denodo
Copy link
Contributor Author

Hi @acarpente-denodo I want to check in with you and see if you need any help from my team. Please share your ask if you need anything so I can try to get one of our engineers to assist. Thank you.

Hi @ethanyzhang! Thanks for offering your help. The MERGE command currently works on Iceberg tables. I'm currently working on optimizing the workload distribution across the worker nodes. The current implementation does not use the table partitioning information to spread the workload efficiently. I would appreciate it if you could help me implement the MergePartitioningHandle, IcebergNodePartitioningProvider, IcebergBucketFunction, IcebergPartitioning, IcebergPartitioningHandle, etc.

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 4 times, most recently from 7ad95ad to 0521ca4 Compare November 17, 2025 17:51
@acarpente-denodo acarpente-denodo changed the title feat: Add SQL Support for MERGE INTO in Presto feat: Add SQL Support for MERGE INTO in Presto (engine + iceberg connector) Nov 18, 2025
@acarpente-denodo acarpente-denodo marked this pull request as ready for review November 18, 2025 17:06
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @acarpente-denodo, your pull request is larger than the review limit of 150000 diff characters

@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch from 0521ca4 to cd53db1 Compare November 19, 2025 10:55
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch 2 times, most recently from b6b49fe to aec7bed Compare November 20, 2025 08:35
acarpente-denodo and others added 5 commits November 20, 2025 15:28
Support SQL MERGE in the Iceberg connector

Cherry-pick of trinodb/trino@6cb188b

Co-authored-by: David Phillips <[email protected]>
SQL MERGE automated tests for Iceberg connector

Cherry-pick of trinodb/trino@6cb188b

Co-authored-by: David Phillips <[email protected]>
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch from aec7bed to ca228af Compare November 20, 2025 14:30
@acarpente-denodo acarpente-denodo force-pushed the feature/20578_SQL_Support_for_MERGE_INTO branch from ca228af to 00a2694 Compare November 20, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants