Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Shekharrajak · 2025-01-31T12:23:01Z

Query engine

Spark, EMR

Question

Hello Team,

I’d like to open a discussion on the best methods for maintaining Iceberg tables. I’m interested in your use cases and perspectives on the following options:

Implementing a scheduled job using Spark Iceberg table APIs to handle retention, deletion of orphan files, and compaction.
Using a scheduled job with Java Iceberg table APIs for retention, orphan file deletion, and compaction.
Comparing the MOR (Merge on Read) versus COW (Copy on Write) update strategies and their respective impacts.

Looking forward to your insights!

Shekharrajak added the question Further information is requested label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Shekharrajak commented Jan 31, 2025

Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Comments

Shekharrajak commented Jan 31, 2025

Query engine

Question