[HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed #13064

nsivabalan · 2025-03-31T22:33:52Z

Change Logs

If there are two concurrent rollback planning by 2 concurrent writers, each could write two rollback plans to timeline. We are fixing that in this patch. Also, following up on [HUDI-7507] Adding timestamp ordering validation before creating requested instant #11580, we are adding locks to rollback planning as well.
Also, for cleaning, we added timestamp validation in PR 11580, but locks were not taken. We are adding locks to clean as well for performing timestamp validation.

This is targetted against 0.x branch.

Impact

Robust rollback planning even in the event of concurrent writers/planners.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

…s before adding new rollback requested to timeline

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java

...-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java

...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java

danny0405 · 2025-04-30T01:24:44Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java

@@ -153,6 +156,7 @@ protected HoodieTable(HoodieWriteConfig config, HoodieEngineContext context, Hoo
    this.index = getIndex(config, context);
    this.storageLayout = getStorageLayout(config);
    this.taskContextSupplier = context.getTaskContextSupplier();
+    this.txnManager = new TransactionManager(config, metaClient.getStorage());


Not sure we introduce txnManager into hoodie table? It looks like it is only used for rollback, can we move it out?

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java

nsivabalan · 2025-05-12T15:05:32Z

hey @danny0405 :
wrt to not introducing TransactionManager in HoodieTable, here are the changes it might introduce.

Flink: CompactionPlanOperator instantiate TransactionManager and pass it to CompactionUtil.rollbackCompaction(table). Or instantiate TransactionManager in CompactionUtil.
RunCompactionActionExecutor.execute() calls into preCommit which can trigger rollback. And so, this percolates to HoodieTable.logCompact, HoodieTable.compact
-> which in turn goes into HoodieTableServiceClient.logCompact and HoodieTableServiceClient.comact().
but this means, we need to change public apis in HoodieTable for compact() and logCompact(), rollback APIs to take in an Option. All of our public apis are very tightly designed. This causes divergence.
Flink: CompactionCommitSink calls into rollbackCompaction(). We need to introduce TransactionManager in CompactionCommitSink.
HoodieFlinkCompactor.compact() could call into rollbackCompaction(). Might have to add TransactionManager to HoodieFlinkCompactor.
BaseHoodieTableServiceClient.purgePendingClustering() can call into rollback. BaseHoodieTableServiceClient.cluster() also could call into rollback. We just need to pass in the TransactionManager.

May main concern is the change in public apis like HoodieTable.compact(), HoodieTable.logCompact(), HoodieTable.rollback*() APIs.

Attaching pic on the APIs we have in HoodieTable :

For Flink specifically, I might need some help as well (whether CompactionCommitSink is the right place to introduce the TransactionManager or not).

Also, I was looking at places where we instantiate TransactionManager, looks like we don't have it clean either.

We have many usages in ActionExecutors which are expected to be called from Hoodie*Table. So, not sure if its worth cleaning it up in this patch where we are looking to fix rollback scheduling. This could be much larger scope if we wanted to clean up the instantiations of TransactionManager to standardize it and keep it at either WriteClient or TableServiceClient.

Let me know what do you think.

danny0405 · 2025-05-13T00:50:49Z

May main concern is the change in public apis like HoodieTable.compact(), HoodieTable.logCompact(), HoodieTable.rollback*() APIs.

The scheduling of clean and rollback should be already separeted from other write APIs I think, maybe just add the fresh new txn manager for these 3 APIs should be good already.

hudi-bot · 2025-06-02T06:11:01Z

CI report:

911c925 UNKNOWN
31a39b9 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan · 2025-06-02T22:58:29Z

@danny0405 : ready for review based on our sync up yesterday.

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java

nsivabalan · 2025-06-05T05:01:46Z

Adding locks to rollback planning and checking for completed rollback…

e23cbe1

…s before adding new rollback requested to timeline

github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 31, 2025

fixing build failures

f20131a

github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Mar 31, 2025

danny0405 reviewed Apr 1, 2025

View reviewed changes

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java Show resolved Hide resolved

nsivabalan added 2 commits April 2, 2025 19:27

Fixing test

2be6cb1

Fixing clean also to take locks to do timestamp validation

8686f3b

nsivabalan changed the title ~~[HUDI-7507] Fixing rollbacks for concurrent rollback planning~~ [HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed Apr 3, 2025

danny0405 reviewed Apr 3, 2025

View reviewed changes

...-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java Outdated Show resolved Hide resolved

danny0405 reviewed Apr 3, 2025

View reviewed changes

...-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java Outdated Show resolved Hide resolved

danny0405 reviewed Apr 3, 2025

View reviewed changes

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java Outdated Show resolved Hide resolved

danny0405 reviewed Apr 3, 2025

View reviewed changes

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java Outdated Show resolved Hide resolved

nsivabalan added 3 commits April 4, 2025 13:58

Addressing feedback

ebc1964

Fixing reloading of timeline with timestamp validation

911c925

Fixing clean planning to take lock as well

cdf9466

nsivabalan commented Apr 4, 2025

View reviewed changes

...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java Show resolved Hide resolved

danny0405 reviewed Apr 30, 2025

View reviewed changes

...mmon/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackPlanActionExecutor.java Outdated Show resolved Hide resolved

Addressing minor feedback

61a7e2e

Removing transaction manager from HoodieTable

31a39b9

danny0405 reviewed Jun 3, 2025

View reviewed changes

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java Show resolved Hide resolved

danny0405 approved these changes Jun 3, 2025

View reviewed changes

nsivabalan merged commit 31dbd89 into apache:branch-0.x Jun 5, 2025
47 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed #13064

[HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed #13064

Uh oh!

nsivabalan commented Mar 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danny0405 Apr 30, 2025

Uh oh!

Uh oh!

nsivabalan commented May 12, 2025

Uh oh!

danny0405 commented May 13, 2025

Uh oh!

hudi-bot commented Jun 2, 2025

Uh oh!

nsivabalan commented Jun 2, 2025

Uh oh!

Uh oh!

nsivabalan commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

[HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed #13064

[HUDI-7507] Fixing Rollbacks and Cleaning to acquire locks as needed #13064

Uh oh!

Conversation

nsivabalan commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danny0405 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nsivabalan commented May 12, 2025

Uh oh!

danny0405 commented May 13, 2025

Uh oh!

hudi-bot commented Jun 2, 2025

CI report:

Uh oh!

nsivabalan commented Jun 2, 2025

Uh oh!

Uh oh!

nsivabalan commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

nsivabalan commented Mar 31, 2025 •

edited

Loading