-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Add SQL Support for MERGE INTO in Presto (engine + iceberg connector) #25470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add SQL Support for MERGE INTO in Presto (engine + iceberg connector) #25470
Conversation
3e4437e to
5b65ae1
Compare
|
@acarpente-denodo : Thanks for this contribution. The code will need extensive changes for Presto C++ support. Have you investigated that ? Did you try this code with a C++ worker (or side-car enabled ) ? We can do the full C++ support as a follow up, but its important the queries fail with a clean error message on Presto C++ |
|
Hi @aditi-pandit, |
yingsu00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Adrian, were you able to build this PR? I got the following errors:
[ERROR] Failed to execute goal org.basepom.maven:duplicate-finder-maven-plugin:1.5.1:check (default) on project presto-main: Found duplicate classes/resources! -> [Help 1]
Also, how did you get the "handle not found" error? Were you running a query using the query runner? Were you able to get the correct plan? Would you please share the plan and full error stack?
| sql/grant | ||
| sql/grant-roles | ||
| sql/insert | ||
| sql/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this commit, the merge into only supports parsing and not the full functionality. Let's move the documentation to the last commit when it's fully supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it's done.
| } | ||
|
|
||
| @Test | ||
| public void testMergeInto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the original name testMerge()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it's done.
yingsu00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Adrian, were you able to build this PR? The CI build failed with these:
In file included from /__w/presto/presto/presto-native-execution/./presto_cpp/main/PeriodicHeartbeatManager.h:17:
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:3: error: unknown type name 'com'
com.facebook.presto.spi.MergeHandle handle = {};
^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:6: error: expected member name or ';' after declaration specifiers
com.facebook.presto.spi.MergeHandle handle = {};
~~~^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1686:3: error: unknown type name 'ConnectorMergeTableHandle'; did you mean 'ConnectorDeleteTableHandle'?
ConnectorMergeTableHandle connectorMergeTableHandle = {};
^~~~~~~~~~~~~~~~~~~~~~~~~
ConnectorDeleteTableHandle
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:[313](https://github.com/prestodb/presto/actions/runs/16068848370/job/45348688297?pr=25470#step:17:314):8: note: 'ConnectorDeleteTableHandle' declared here
struct ConnectorDeleteTableHandle : public JsonEncodedSubclass {};
^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1709:3: error: unknown type name 'RowChangeParadigm'
RowChangeParadigm paradigm = {};
^
4 errors generated.
Would you please fix the build errors first?
Also, how did you get the "handle not found" error? Were you running a query using the query runner? Since the build fails I wonder how you could do that. If you could run a query, were you able to get the correct plan? Would you please share the plan and full error stack? Thanks!
ace4af6 to
b02d82f
Compare
|
Hi @yingsu00, The PR is still in development. I'm still focused on getting the MERGE INTO command fully working on Java. The coordinator and workers are currently running on Java. I haven't investigated the MERGE INTO support for Presto C++ yet. Automated tests have not been created yet. To test this enhancement, you need to perform a manual test. It is described at the end of the PR description in the section called "Test Plan". I'm facing an error during the serialization of a new class called MergePartitioningHandle, which implements ConnectorPartitioningHandle interface I think there is an issue with The goal of this draft PR is to get assistance in fixing this problem. |
3465776 to
9658fd0
Compare
20d7c91 to
441ed5b
Compare
09c9c54 to
bca3929
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format nit. Thanks for the doc!
|
Hi @steveburnett! Thank you for reviewing the feature documentation. |
7bb8555 to
e799d9f
Compare
fd4bc78 to
8f49697
Compare
Hi @ethanyzhang! Thanks for offering your help. The MERGE command currently works on Iceberg tables. I'm currently working on optimizing the workload distribution across the worker nodes. The current implementation does not use the table partitioning information to spread the workload efficiently. I would appreciate it if you could help me implement the MergePartitioningHandle, IcebergNodePartitioningProvider, IcebergBucketFunction, IcebergPartitioning, IcebergPartitioningHandle, etc. |
7ad95ad to
0521ca4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @acarpente-denodo, your pull request is larger than the review limit of 150000 diff characters
0521ca4 to
cd53db1
Compare
b6b49fe to
aec7bed
Compare
Cherry-pick of trinodb/trino@cee96c3 Co-authored-by: David Stryker <[email protected]>
Automated tests. Cherry-pick of trinodb/trino@cee96c3 Co-authored-by: David Stryker <[email protected]>
Added MERGE INTO statement documentation.
Support SQL MERGE in the Iceberg connector Cherry-pick of trinodb/trino@6cb188b Co-authored-by: David Phillips <[email protected]>
SQL MERGE automated tests for Iceberg connector Cherry-pick of trinodb/trino@6cb188b Co-authored-by: David Phillips <[email protected]>
aec7bed to
ca228af
Compare
ca228af to
00a2694
Compare
Description
Engine support for SQL MERGE command. This command inserts or updates rows in a table based on specified conditions.
Iceberg connector support for the SQL MERGE INTO command.
Syntax:
Example: MERGE INTO usage to update the sales information for existing products and insert the sales information for the new products in the market.
The Presto engine commit introduces an enum called RowChangeParadigm, which describes how a connector modifies rows. The iceberg connector will utilize the
DELETE_ROW_AND_INSERT_ROWparadigm, as it represents an updated row as a combination of a deleted row followed by an inserted row. TheCHANGE_ONLY_UPDATED_COLUMNSparadigm is meant for connectors that support updating individual columns of rows.Note: Changes were made after reviewing the following Trino PR: trinodb/trino#7126
So, this PR is deeply inspired by Trino's implementation.
Motivation and Context
The
MERGE INTOstatement is commonly used to integrate data from two tables with different contents but similar structures.For example, the source table could be part of a production transactional system, while the target table might be located in a data warehouse for analytics.
Regularly, MERGE operations are performed to update the analytics warehouse with the latest production data.
You can also use MERGE with tables that have different structures, as long as you can define a condition to match the rows between them.
Test Plan
Automated tests developed in:
Contributor checklist
Release Notes