Skip to content

Spark: Include info about failed commits in the result of RewriteDataFilesAction#15226

Open
omerhadari wants to merge 2 commits intoapache:mainfrom
omerhadari:report-failed-commits-as-failed-files
Open

Spark: Include info about failed commits in the result of RewriteDataFilesAction#15226
omerhadari wants to merge 2 commits intoapache:mainfrom
omerhadari:report-failed-commits-as-failed-files

Conversation

@omerhadari
Copy link

@omerhadari omerhadari commented Feb 3, 2026

When partial progress is enabled and a commit fails, the file groups in that batch were previously lost. When looking at the result, failedDataFilesCount is 0 when commits failed.

When some commits failed and some succeeded, it's unclear by looking at the result that it is actually partial.

One choice I'm not sure about and am happy to get feedback on is that this change makes it so the creation of result objects now treats commit failures as other rewrite failures (both failure count and failed data files).
Given the aggregative nature of the result object, I couldn't think of a situation where the distinction is important enough to justify changing its structure, but I am open to changing it to be counted separately.

I haven't seen an AI policy, but FYI I used AI to implement tests and changed them manually to fit the style other similar tests in the same area.

When partial progress is enabled and a commit fails, the file groups
in that batch were previously lost. This change tracks them in a new
failedRewrites queue and exposes them via the failures() method,
allowing RewriteDataFilesSparkAction to include them in the result's
rewriteFailures list.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures commit failures are reflected in RewriteDataFilesAction results when partial progress is enabled, so failed file groups are no longer “lost” from the aggregated result.

Changes:

  • Track and expose file groups that failed during commit batches in BaseCommitService.
  • Include commit-failed file groups in RewriteDataFilesSparkAction’s aggregated rewriteFailures and failedDataFilesCount.
  • Extend Spark action tests across Spark versions to assert failures/counts when a commit fails.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Adds commit-failed file groups into the action result’s failure aggregation.
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Asserts failures and failed data file counts for commit-failure partial progress.
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Same as Spark 4.1 change for older Spark line.
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Same as Spark 4.1 test tightening for older Spark line.
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Same as Spark 4.1 change for Spark 3.5 line.
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Same as Spark 4.1 test tightening for Spark 3.5 line.
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Same as Spark 4.1 change for Spark 3.4 line.
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Same as Spark 4.1 test tightening for Spark 3.4 line.
core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java Tracks failed commit batches’ groups and exposes them via a new failures() API.
core/src/test/java/org/apache/iceberg/actions/TestCommitService.java Adds test coverage to ensure commit failures are tracked and queryable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +164 to +170
/** Returns all File groups which failed to commit */
public List<T> failures() {
Preconditions.checkState(
committerService.isShutdown(),
"Cannot get failures from a service which has not been closed");
return Lists.newArrayList(failedRewrites.iterator());
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Javadoc and precondition message are a bit unclear/grammatically awkward. Consider updating to clarify that these are file groups from commit batches that threw, and tweak wording/capitalization (e.g., “file groups that failed to commit” and “service that has not been closed”).

Copilot uses AI. Check for mistakes.
Copy link
Author

@omerhadari omerhadari Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…s/TestRewriteDataFilesAction.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant