Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal Strategies for Iceberg Retention and Compaction to Enhance Performance #12141

Open
Shekharrajak opened this issue Jan 31, 2025 · 0 comments
Labels
question Further information is requested

Comments

@Shekharrajak
Copy link

Query engine

Spark, EMR

Question

Hello Team,

I’d like to open a discussion on the best methods for maintaining Iceberg tables. I’m interested in your use cases and perspectives on the following options:

  • Implementing a scheduled job using Spark Iceberg table APIs to handle retention, deletion of orphan files, and compaction.
  • Using a scheduled job with Java Iceberg table APIs for retention, orphan file deletion, and compaction.
  • Comparing the MOR (Merge on Read) versus COW (Copy on Write) update strategies and their respective impacts.

Looking forward to your insights!

@Shekharrajak Shekharrajak added the question Further information is requested label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant