Log files in Hudi MOR table are not getting deleted #12702

koochiswathiTR · 2025-01-24T10:33:04Z

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

We use MOR Hudi table , We read kinesis stream and process in AWS EMR using spark streaming.
We use inline compaction and cleanup based on the commits,
Though compaction and cleanup is running we see some of hudi log files that are generated while ingestion are not getting cleared fully.
We process with batch interval of 5 mints for every 5 mints we do one hudi commit.

Below are our hudi configs

DataSourceWriteOptions.TABLE_TYPE.key() -> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,

DataSourceWriteOptions.RECORDKEY_FIELD.key() -> "guid",

DataSourceWriteOptions.PARTITIONPATH_FIELD.key() -> "collectionName",

DataSourceWriteOptions.PRECOMBINE_FIELD.key() -> "operationTime",

DataSourceWriteOptions.HIVE_PARTITION_FIELDS.key() -> "collectionName",

DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS.key() -> classOf[MultiPartKeysValueExtractor].getName,

DataSourceWriteOptions.HIVE_SYNC_MODE.key() -> "hms",

DataSourceWriteOptions.HIVE_USE_JDBC.key() -> "false",*/

HoodieCompactionConfig.INLINE_COMPACT_TRIGGER_STRATEGY.key() -> CompactionTriggerStrategy.TIME_ELAPSED.name,

HoodieCompactionConfig.INLINE_COMPACT_TIME_DELTA_SECONDS.key() -> String.valueOf(60 * 6 * 60),

HoodieCompactionConfig.CLEANER_POLICY.key() -> HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name(),

HoodieCompactionConfig.CLEANER_COMMITS_RETAINED.key() -> "228",

HoodieCompactionConfig.MIN_COMMITS_TO_KEEP.key() -> "229",

HoodieCompactionConfig.MAX_COMMITS_TO_KEEP.key() -> "252",

HoodieCompactionConfig.ASYNC_CLEAN.key() -> "false",

HoodieCompactionConfig.INLINE_COMPACT.key() -> "true",

HoodieMetricsConfig.TURN_METRICS_ON.key() -> "true",

HoodieMetricsConfig.METRICS_REPORTER_TYPE_VALUE.key() -> MetricsReporterType.DATADOG.name(),

HoodieMetricsDatadogConfig.API_SITE_VALUE.key() -> "US",

HoodieMetricsDatadogConfig.METRIC_PREFIX_VALUE.key() -> "XXXX.hudi",

HoodieMetricsDatadogConfig.API_KEY_SUPPLIER.key() -> "XXXXX.XXX.DatadogKeySupplier",

HoodieMetadataConfig.ENABLE.key() -> "false",

HoodieWriteConfig.ROLLBACK_USING_MARKERS_ENABLE.key() -> "false",

Snapshots attached :

If you want the .hoodi folder, Please let us know we will share it here.

Hudi version : 0.9
Spark version : 3.2.1
EMR Version : 6.7
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

koochiswathiTR · 2025-01-24T10:35:09Z

@xushiyan Please let us know how to share the .hoodi folder

ad1happy2go · 2025-01-27T09:36:17Z

@koochiswathiTR You can either zip it and send through Apache Hudi Slack. You can ping me (Aditya Goenka) or Ranga Reddy.

In case if you are not able to share, let us know. We can get into a call and debug.

koochiswathiTR · 2025-01-28T06:06:44Z

@ad1happy2go Where can i ping you? can I share via teams?

ad1happy2go · 2025-01-28T10:11:01Z

Are you "Apache Hudi" Slack community? you can share there

https://apache-hudi.slack.com/

koochiswathiTR · 2025-01-28T13:24:22Z

Im not able to find Apache Hudi in the slack

koochiswathiTR · 2025-01-29T05:15:34Z

@ad1happy2go I am not able upload the .hoodi folder here as the size is 2.3 gb zip file nor I didn't find Apache Hudi channel in slack.
If you can give us any S3 link to upload it would be good.

koochiswathiTR · 2025-01-30T04:20:17Z

@ad1happy2go @xushiyan

koochiswathiTR · 2025-01-30T10:42:11Z

@ad1happy2go Finally found Hudi Slack, but I am not able to upload file there as its 2.3g zip file.

koochiswathiTR · 2025-01-30T13:12:31Z

@ad1happy2go I have pinged you the .hoodi folder via slack..

ad1happy2go · 2025-01-30T17:37:45Z

Thanks @koochiswathiTR . We are looking into it.

koochiswathiTR · 2025-02-04T10:26:28Z

@ad1happy2go @xushiyan Any update on this?

ad1happy2go · 2025-02-04T13:02:26Z

@koochiswathiTR I was trying to reach you from friday on Apache Hudi Slack. Did you got a chance to check. We can connect on call too to debug this further.

koochiswathiTR · 2025-02-04T13:15:58Z

@ad1happy2go Uploaded again, Please check and confirm.

koochiswathiTR · 2025-02-04T13:16:31Z

@ad1happy2go sure we can meet, can you check once the zip file. what time will you be available?

ad1happy2go · 2025-02-05T11:58:46Z

@koochiswathiTR I checked the timeline files and noticed that as you said in clean planning i just see logs files and no parquet files. I pinged you on slack about my available timings.

ad1happy2go added the table-service label Jan 27, 2025

ad1happy2go added this to Hudi Issue Support Jan 27, 2025

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log files in Hudi MOR table are not getting deleted #12702

Log files in Hudi MOR table are not getting deleted #12702

koochiswathiTR commented Jan 24, 2025

koochiswathiTR commented Jan 24, 2025

ad1happy2go commented Jan 27, 2025 •

edited

Loading

koochiswathiTR commented Jan 28, 2025

ad1happy2go commented Jan 28, 2025

koochiswathiTR commented Jan 28, 2025

koochiswathiTR commented Jan 29, 2025

koochiswathiTR commented Jan 30, 2025

koochiswathiTR commented Jan 30, 2025

koochiswathiTR commented Jan 30, 2025

ad1happy2go commented Jan 30, 2025

koochiswathiTR commented Feb 4, 2025

ad1happy2go commented Feb 4, 2025

koochiswathiTR commented Feb 4, 2025

koochiswathiTR commented Feb 4, 2025

ad1happy2go commented Feb 5, 2025

Log files in Hudi MOR table are not getting deleted #12702

Log files in Hudi MOR table are not getting deleted #12702

Comments

koochiswathiTR commented Jan 24, 2025

koochiswathiTR commented Jan 24, 2025

ad1happy2go commented Jan 27, 2025 • edited Loading

koochiswathiTR commented Jan 28, 2025

ad1happy2go commented Jan 28, 2025

koochiswathiTR commented Jan 28, 2025

koochiswathiTR commented Jan 29, 2025

koochiswathiTR commented Jan 30, 2025

koochiswathiTR commented Jan 30, 2025

koochiswathiTR commented Jan 30, 2025

ad1happy2go commented Jan 30, 2025

koochiswathiTR commented Feb 4, 2025

ad1happy2go commented Feb 4, 2025

koochiswathiTR commented Feb 4, 2025

koochiswathiTR commented Feb 4, 2025

ad1happy2go commented Feb 5, 2025

ad1happy2go commented Jan 27, 2025 •

edited

Loading