Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] GCP GCE Dataproc job will fail with after checking for whether or not the .commit.requested exists #12734

Open
sweir-thescore opened this issue Jan 29, 2025 · 4 comments

Comments

@sweir-thescore
Copy link

Tips before filing an issue

  • Have you gone through our FAQs? Yes, also searched for relevant Github Issues

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

  • Due to a GCE Cluster re-configuration (see below*), A GCP GCE Dataproc job will fail with either:
    • org.apache.hudi.timeline.service.RequestHandler: Bad request response due to client view behind server view
    • common.table.timeline.HoodieActiveTimeline: Checking for file exists ?gs://REDACTED/.hoodie/20250129171955324.commit.requested
    • org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
  • This then causes a rollback file to not end up being created, which leaves us with an incomplete rollback (.rollback.requested and .rollback.inflight files only).
  • Further Dataproc runs have issues with this failed rollback
  • It seems like there may be something strange going on between the driver node pool and the embedded timeline server
  • Deleting a requested commit doesn't necessarily affect the subsequent Dataproc job from running successfully

*New GCE Cluster set up to use:

  • Four (4) pools of nodes now
    • Main
    • Driver
    • Primary Workers
    • Secondary Workers
  • The previous GCE Cluster set up:
    • in the old cluster driver and execs where on two different nodes
    • the driver container was on the main nodes, where now the driver is on a dedicated node pool
    • In this previous set up, we did not encounter these issues. Typically, we had a graceful shut down and didn't end up losing a commit rollback file.

To Reproduce

Steps to reproduce the behavior:

  1. Run Dataproc job on Driver Node Pool GCE Cluster
  2. ??

It is unclear currently how we can reproduce this issue consistently ourselves.

Expected behavior

Hudi Timeline Client View does not fall behind the Server View and cause this problem.

Environment Description

  • Hudi version : 0.14.1

  • Spark version : 3.1.3

  • Hive version : 3.1.3

  • Hadoop version : 3.2.4

  • Storage (HDFS/S3/GCS..) : GCS

  • Running on Docker? (yes/no) : yes

Additional context

--target-table REDACTED
--target-base-path gs://path/to/REDACTED
--source-ordering-field updated_at
--min-sync-interval-seconds 15
--source-limit 16000
--continuous --source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource
--payload-class org.apache.hudi.common.model.debezium.PostgresDebeziumAvroPayload
--table-type COPY_ON_WRITE
--op UPSERT
--compact-scheduling-weight 3
--delta-sync-scheduling-weight 4
--post-write-termination-strategy-class org.apache.hudi.utilities.streamer.NoNewDataTerminationStrategy
--hoodie-conf hoodie.table.name=REDACTED
--hoodie-conf hoodie.base.path=gs://path/to/REDACTED
--hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
--hoodie-conf hoodie.keygen.timebased.timestamp.type=DATE_STRING
--hoodie-conf hoodie.keygen.timebased.input.dateformat=yyyy-MM-dd
--hoodie-conf hoodie.keygen.timebased.output.dateformat=yyyy-MM-dd
--hoodie-conf max.rounds.without.new.data.to.shutdown=5
--hoodie-conf sasl.mechanism=PLAIN
--hoodie-conf security.protocol=SASL_SSL
--hoodie-conf hoodie.datasource.write.reconcile.schema=true
--hoodie-conf bootstrap.servers=REDACTED.gcp.confluent.cloud:REDACTED
--hoodie-conf sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='REDACTED' password='REDACTED';
--hoodie-conf schema.registry.url=https://REDACTED.gcp.confluent.cloud
--hoodie-conf basic.auth.credentials.source=USER_INFO
--hoodie-conf schema.registry.basic.auth.user.info=REDACTED
--hoodie-conf hoodie.streamer.schemaprovider.registry.url=https://[email protected]/subjects/REDACTED-value/versions/latest
--hoodie-conf hoodie.datasource.write.recordkey.field=id
--hoodie-conf hoodie.datasource.write.partitionpath.field=inserted_at_date
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=True
--hoodie-conf hoodie.datasource.write.precombine.field=updated_at
--hoodie-conf hoodie.datasource.write.operation=UPSERT
--hoodie-conf hoodie.streamer.source.kafka.topic=REDACTED
--hoodie-conf hoodie.streamer.source.kafka.value.deserializer.class=org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer
--hoodie-conf group.id=hudi-deltastreamer
--hoodie-conf auto.offset.reset=earliest
--hoodie-conf hoodie.write.concurrency.mode=SINGLE_WRITER
--hoodie-conf hoodie.datasource.write.drop.partition.columns=True
--hoodie-conf hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider
--hoodie-conf hoodie.cleaner.policy.failed.writes=EAGER
--hoodie-conf hoodie.client.heartbeat.interval_in_ms=120000
--hoodie-conf hoodie.client.heartbeat.tolerable.misses=10
--hoodie-conf hoodie.keep.min.commits=100
--hoodie-conf hoodie.keep.max.commits=130
--hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS
--hoodie-conf hoodie.clean.automatic=true
--hoodie-conf hoodie.cleaner.commits.retained=50
--hoodie-conf hoodie.cleaner.hours.retained=72
--hoodie-conf hoodie.metadata.enable=True
--hoodie-conf hoodie.datasource.write.schema.allow.auto.evolution.column.drop=false
--hoodie-conf hoodie.write.set.null.for.missing.columns=true
--hoodie-conf hoodie.write.commit.callback.http.url=https://REDACTED/v1/hudi_commit
--hoodie-conf hoodie.write.commit.callback.on=true
--hoodie-conf request.timeout.ms=90000
--hoodie-conf session.timeout.ms=120000
--hoodie-conf heartbeat.interval.ms=5000
--hoodie-conf retry.backoff.ms=500
--hoodie-conf hoodie.gcp.bigquery.sync.project_id=REDACTED
--hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=REDACTED
--hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1
--hoodie-conf hoodie.gcp.bigquery.sync.source_uri=gs://path/to/REDACTED/inserted_at_date=*
--hoodie-conf hoodie.gcp.bigquery.sync.use_bq_manifest_file=True
--hoodie-conf hoodie.gcp.bigquery.sync.source_uri_prefix=gs://path/to/REDACTED
--hoodie-conf hoodie.gcp.bigquery.sync.base_path=gs://path/to/REDACTED
--hoodie-conf hoodie.gcp.bigquery.sync.table_name=REDACTED
--hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=True
--hoodie-conf hoodie.datasource.hive_sync.assume_date_partitioning=False
--hoodie-conf hoodie.partition.metafile.use.base.format=True
--hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=inserted_at_date
--hoodie-conf hoodie.streamer.transformer.sql=SELECT *, CAST(FROM_UNIXTIME(inserted_at/1e6, 'yyyy-MM-dd') as STRING) AS inserted_at_date FROM <SRC>
--sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool
--enable-sync
--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer

Stacktrace

Normal logs above...
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader: Merging the final data blocks
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 4
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 3
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 2
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 1
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieLogFileReader: Closing Log file reader .files-0000-0_20250129171658882001.log.4_0-15502-63677
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: Number of log files scanned => 4
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: MaxMemoryInBytes allowed for compaction => 1073741824
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: Number of entries in MemoryBasedMap in ExternalSpillableMap => 2
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 1152
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: Number of entries in DiskBasedMap in ExternalSpillableMap => 0
25/01/29 17:20:56 INFO org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner: Size of file spilled to disk => 0
25/01/29 17:20:56 INFO org.apache.hudi.metadata.HoodieBackedTableMetadata: Opened 4 metadata log files (dataset instant=20250129172015176, metadata instant=20250129172015176) in 5839 ms
25/01/29 17:20:56 INFO org.apache.hudi.metadata.BaseTableMetadata: Listed file in partition from metadata: partition=inserted_at_date=2025-01-29, #files=53
25/01/29 17:20:56 WARN org.apache.hudi.timeline.service.RequestHandler: Bad request response due to client view behind server view. Last known instant from client was 20250129171945640 but server has the following timeline [[20241205021154706__rollback__COMPLETED__20241205021206181], [20241205202627954__rollback__COMPLETED__20241205202644147], [20241206023048660__rollback__COMPLETED__20241206023100273], [20241206120042030__rollback__COMPLETED__20241206120056023], [20241207121547517__rollback__COMPLETED__20241207121600102], [20241208124320499__rollback__COMPLETED__20241208124331342], [20241208184223859__rollback__COMPLETED__20241208184236269], [20241209130237659__rollback__COMPLETED__20241209130248678], [20241210013009641__rollback__COMPLETED__20241210013015878], [20241210132641327__rollback__COMPLETED__20241210132653975], [20241211013711661__rollback__COMPLETED__20241211013720024], [20241211195455981__rollback__COMPLETED__20241211195501942], [20241213164945031__rollback__COMPLETED__20241213164952305], [20241215074906951__rollback__COMPLETED__20241215074918101], [20241215135545211__rollback__COMPLETED__20241215135552173], [20241215202347053__rollback__COMPLETED__20241215202354115], [20241216090537383__rollback__COMPLETED__20241216090548509], [20241216151222763__rollback__COMPLETED__20241216151229435], [20241216212334558__rollback__COMPLETED__20241216212345466], [20241217032427149__rollback__COMPLETED__20241217032434509], [20241218015640090__rollback__COMPLETED__20241218015646706], [20241218080247557__rollback__COMPLETED__20241218080254777], [20241218140814608__rollback__COMPLETED__20241218140826819], [20241218201440734__rollback__COMPLETED__20241218201448153], [20241219203517798__rollback__COMPLETED__20241219203529858], [20241220041318101__rollback__COMPLETED__20241220041324009], [20241220161818956__rollback__COMPLETED__20241220161825286], [20241220223726836__rollback__COMPLETED__20241220223734221], [20241221042905591__rollback__COMPLETED__20241221042911897], [20241221163713019__rollback__COMPLETED__20241221163719660], [20241221230937539__rollback__COMPLETED__20241221230945303], [20241222075226455__rollback__COMPLETED__20241222075239199], [20241222200603311__rollback__COMPLETED__20241222200614505], [20241223142313252__rollback__COMPLETED__20241223142320915], [20241224024304414__rollback__COMPLETED__20241224024317083], [20241224084536379__rollback__COMPLETED__20241224084549316], [20241224210111099__rollback__COMPLETED__20241224210125340], [20241225155313108__rollback__COMPLETED__20241225155320746], [20241226084753589__rollback__COMPLETED__20241226084759940], [20241226145828024__rollback__COMPLETED__20241226145839545], [20241226234947347__rollback__COMPLETED__20241226235000441], [20241227055222471__rollback__COMPLETED__20241227055229513], [20241227183132445__rollback__COMPLETED__20241227183138415], [20241228122620699__rollback__COMPLETED__20241228122627608], [20241229005325869__rollback__COMPLETED__20241229005339610], [20241229194751822__rollback__COMPLETED__20241229194758186], [20241230015524085__rollback__COMPLETED__20241230015536426], [20241230205953689__rollback__COMPLETED__20241230210000474], [20241231081830072__rollback__COMPLETED__20241231081836159], [20241231142950851__rollback__COMPLETED__20241231142956866], [20241231203344986__rollback__COMPLETED__20241231203352130], [20250101024540624__rollback__COMPLETED__20250101024546658], [20250101172059033__rollback__COMPLETED__20250101172113897], [20250101221200035__rollback__COMPLETED__20250101221209742], [20250102124127122__rollback__COMPLETED__20250102124133595], [20250102184719584__rollback__COMPLETED__20250102184726380], [20250103130307247__rollback__COMPLETED__20250103130321644], [20250104011009695__rollback__COMPLETED__20250104011016805], [20250104081118919__rollback__COMPLETED__20250104081124861], [20250104142057676__rollback__COMPLETED__20250104142103487], [20250104230326058__rollback__COMPLETED__20250104230332142], [20250105051103078__rollback__COMPLETED__20250105051108714], [20250105111345654__rollback__COMPLETED__20250105111351572], [20250106003223638__rollback__COMPLETED__20250106003231597], [20250106134321061__rollback__COMPLETED__20250106134331332], [20250106205553704__rollback__COMPLETED__20250106205559809], [20250107202526922__rollback__COMPLETED__20250107202534770], [20250108083635690__rollback__COMPLETED__20250108083649040], [20250108144512489__rollback__COMPLETED__20250108144518520], [20250108205200036__rollback__COMPLETED__20250108205212851], [20250109085947536__rollback__COMPLETED__20250109085953718], [20250110092504788__rollback__COMPLETED__20250110092512369], [20250110152930407__rollback__COMPLETED__20250110152941797], [20250111094445147__rollback__COMPLETED__20250111094457832], [20250112040003387__rollback__COMPLETED__20250112040011452], [20250112100646945__rollback__COMPLETED__20250112100653180], [20250112191857010__rollback__COMPLETED__20250112191912022], [20250113012634105__rollback__COMPLETED__20250113012650527], [20250113072928535__rollback__COMPLETED__20250113072936362], [20250113194733825__rollback__COMPLETED__20250113194740992], [20250114081553947__rollback__COMPLETED__20250114081559996], [20250114142618722__rollback__COMPLETED__20250114142632761], [20250114203323087__rollback__COMPLETED__20250114203331108], [20250115084034713__rollback__COMPLETED__20250115084046015], [20250116074855165__rollback__COMPLETED__20250116074909768], [20250117020739078__rollback__COMPLETED__20250117020748455], [20250117080932432__rollback__COMPLETED__20250117080940534], [20250117185753948__rollback__COMPLETED__20250117185802164], [20250118070620223__rollback__COMPLETED__20250118070633142], [20250118130814886__rollback__COMPLETED__20250118130821862], [20250118191619832__rollback__COMPLETED__20250118191629099], [20250119194317320__rollback__COMPLETED__20250119194324020], [20250120015339307__rollback__COMPLETED__20250120015347982], [20250120075636207__rollback__COMPLETED__20250120075643696], [20250120140428889__rollback__COMPLETED__20250120140436892], [20250120201051645__rollback__COMPLETED__20250120201100713], [20250121021717934__rollback__COMPLETED__20250121021725880], [20250121195111148__rollback__COMPLETED__20250121195118997], [20250122015335680__rollback__COMPLETED__20250122015343930], [20250122080128413__rollback__COMPLETED__20250122080135545], [20250122140835814__rollback__COMPLETED__20250122140842981], [20250123021720129__rollback__COMPLETED__20250123021733816], [20250123123259389__rollback__COMPLETED__20250123123307244], [20250123183816172__rollback__COMPLETED__20250123183823065], [20250124004321948__rollback__COMPLETED__20250124004329251], [20250124064747343__rollback__COMPLETED__20250124064753698], [20250124185741408__rollback__COMPLETED__20250124185756080], [20250125095930863__rollback__COMPLETED__20250125095939001], [20250125160133794__rollback__COMPLETED__20250125160148969], [20250126041451739__rollback__COMPLETED__20250126041505600], [20250126102142409__rollback__COMPLETED__20250126102149689], [20250126162925860__rollback__COMPLETED__20250126162939660], [20250126223515204__rollback__COMPLETED__20250126223527369], [20250127165236342__rollback__COMPLETED__20250127165248139], [20250127234711607__rollback__COMPLETED__20250127234720093], [20250128054842803__rollback__COMPLETED__20250128054850858], [20250128164505372__rollback__COMPLETED__20250128164512069], [20250128182045853__rollback__COMPLETED__20250128182051931], [20250129045342148__rollback__COMPLETED__20250129045414629], [20250129111559632__rollback__COMPLETED__20250129111605532], [20250129144650460__commit__COMPLETED__20250129144728971], [20250129144755725__commit__COMPLETED__20250129144833182], [20250129144900572__commit__COMPLETED__20250129144939118], [20250129145007342__commit__COMPLETED__20250129145059380], [20250129145125049__commit__COMPLETED__20250129145204620], [20250129145235636__commit__COMPLETED__20250129145315450], [20250129145343835__commit__COMPLETED__20250129145423661], [20250129145506711__commit__COMPLETED__20250129145605140], [20250129145633864__commit__COMPLETED__20250129145725698], [20250129145751582__commit__COMPLETED__20250129145828604], [20250129145855079__commit__COMPLETED__20250129145933349], [20250129150001412__commit__COMPLETED__20250129150041430], [20250129150109565__commit__COMPLETED__20250129150150005], [20250129150218937__commit__COMPLETED__20250129150311304], [20250129150338484__commit__COMPLETED__20250129150418883], [20250129150446477__commit__COMPLETED__20250129150529254], [20250129150529507__clean__COMPLETED__20250129150542644], [20250129150601572__commit__COMPLETED__20250129150641239], [20250129150641459__clean__COMPLETED__20250129150654479], [20250129150710124__commit__COMPLETED__20250129150750391], [20250129150750625__clean__COMPLETED__20250129150803546], [20250129150820286__commit__COMPLETED__20250129150917970], [20250129150918242__clean__COMPLETED__20250129150930913], [20250129150945038__commit__COMPLETED__20250129151024753], [20250129151024991__clean__COMPLETED__20250129151037531], [20250129151051729__commit__COMPLETED__20250129151133558], [20250129151133775__clean__COMPLETED__20250129151146330], [20250129151201108__commit__COMPLETED__20250129151242088], [20250129151242338__clean__COMPLETED__20250129151255672], [20250129151328192__commit__COMPLETED__20250129151409053], [20250129151409323__clean__COMPLETED__20250129151423321], [20250129151439503__commit__COMPLETED__20250129151536640], [20250129151536895__clean__COMPLETED__20250129151549238], [20250129151604128__commit__COMPLETED__20250129151644129], [20250129151644379__clean__COMPLETED__20250129151656869], [20250129151711104__commit__COMPLETED__20250129151751832], [20250129151752056__clean__COMPLETED__20250129151804663], [20250129151818670__commit__COMPLETED__20250129151901641], [20250129151901945__clean__COMPLETED__20250129151915248], [20250129151929765__commit__COMPLETED__20250129152009953], [20250129152010225__clean__COMPLETED__20250129152023304], [20250129152037508__commit__COMPLETED__20250129152131896], [20250129152132132__clean__COMPLETED__20250129152144319], [20250129152158169__commit__COMPLETED__20250129152237964], [20250129152238185__clean__COMPLETED__20250129152250141], [20250129152303909__commit__COMPLETED__20250129152345318], [20250129152345931__clean__COMPLETED__20250129152400485], [20250129152415381__commit__COMPLETED__20250129152457119], [20250129152457361__clean__COMPLETED__20250129152510624], [20250129152525653__commit__COMPLETED__20250129152607620], [20250129152607849__clean__COMPLETED__20250129152620982], [20250129152636241__commit__COMPLETED__20250129152732099], [20250129152732322__clean__COMPLETED__20250129152744720], [20250129152758840__commit__COMPLETED__20250129152842076], [20250129152842295__clean__COMPLETED__20250129152855378], [20250129152909868__commit__COMPLETED__20250129152958538], [20250129152958829__clean__COMPLETED__20250129153011825], [20250129153025948__commit__COMPLETED__20250129153108564], [20250129153108778__clean__COMPLETED__20250129153121544], [20250129153136712__commit__COMPLETED__20250129153219886], [20250129153220151__clean__COMPLETED__20250129153233474], [20250129153303159__commit__COMPLETED__20250129153417172], [20250129153417451__clean__COMPLETED__20250129153429728], [20250129153444075__commit__COMPLETED__20250129153523432], [20250129153523676__clean__COMPLETED__20250129153536077], [20250129153550253__commit__COMPLETED__20250129153631240], [20250129153631466__clean__COMPLETED__20250129153643820], [20250129153658549__commit__COMPLETED__20250129153740008], [20250129153740231__clean__COMPLETED__20250129153753214], [20250129153807742__commit__COMPLETED__20250129153849045], [20250129153849286__clean__COMPLETED__20250129153902438], [20250129153917687__commit__COMPLETED__20250129154011387], [20250129154011651__clean__COMPLETED__20250129154023653], [20250129154037728__commit__COMPLETED__20250129154118508], [20250129154118752__clean__COMPLETED__20250129154130668], [20250129154144895__commit__COMPLETED__20250129154225805], [20250129154226057__clean__COMPLETED__20250129154238259], [20250129154252672__commit__COMPLETED__20250129154334626], [20250129154334879__clean__COMPLETED__20250129154347478], [20250129154402345__commit__COMPLETED__20250129154444177], [20250129154444393__clean__COMPLETED__20250129154456911], [20250129154513206__commit__COMPLETED__20250129154608849], [20250129154609083__clean__COMPLETED__20250129154621320], [20250129154634849__commit__COMPLETED__20250129154715212], [20250129154715436__clean__COMPLETED__20250129154727711], [20250129154741514__commit__COMPLETED__20250129154823502], [20250129154823716__clean__COMPLETED__20250129154835967], [20250129154850197__commit__COMPLETED__20250129154932509], [20250129154932806__clean__COMPLETED__20250129154945636], [20250129154959848__commit__COMPLETED__20250129155042578], [20250129155042809__clean__COMPLETED__20250129155055795], [20250129155130871__commit__COMPLETED__20250129155226281], [20250129155226567__clean__COMPLETED__20250129155238930], [20250129155253213__commit__COMPLETED__20250129155336878], [20250129155337109__clean__COMPLETED__20250129155349216], [20250129155403508__commit__COMPLETED__20250129155450123], [20250129155450328__clean__COMPLETED__20250129155502484], [20250129155516768__commit__COMPLETED__20250129155559819], [20250129155600052__clean__COMPLETED__20250129155612306], [20250129155626663__commit__COMPLETED__20250129155711398], [20250129155711669__clean__COMPLETED__20250129155723952], [20250129155738410__commit__COMPLETED__20250129155835381], [20250129155835614__clean__COMPLETED__20250129155847882], [20250129155901387__commit__COMPLETED__20250129155943814], [20250129155944049__clean__COMPLETED__20250129155956951], [20250129160020772__commit__COMPLETED__20250129160112944], [20250129160113181__clean__COMPLETED__20250129160126303], [20250129160140996__commit__COMPLETED__20250129160226070], [20250129160226325__clean__COMPLETED__20250129160239500], [20250129160256235__commit__COMPLETED__20250129160343494], [20250129160343736__clean__COMPLETED__20250129160358294], [20250129160415283__commit__COMPLETED__20250129160518077], [20250129160518347__clean__COMPLETED__20250129160531994], [20250129160547169__commit__COMPLETED__20250129160633175], [20250129160633417__clean__COMPLETED__20250129160646780], [20250129160701209__commit__COMPLETED__20250129160747557], [20250129160747786__clean__COMPLETED__20250129160801046], [20250129160816784__commit__COMPLETED__20250129160902856], [20250129160903088__clean__COMPLETED__20250129160916520], [20250129160934369__commit__COMPLETED__20250129161019755], [20250129161019996__clean__COMPLETED__20250129161033489], [20250129161051084__commit__COMPLETED__20250129161149620], [20250129161149851__clean__COMPLETED__20250129161203243], [20250129161233346__commit__COMPLETED__20250129161335181], [20250129161335399__clean__COMPLETED__20250129161347956], [20250129161403005__commit__COMPLETED__20250129161447540], [20250129161447758__clean__COMPLETED__20250129161500184], [20250129161516792__commit__COMPLETED__20250129161601129], [20250129161601340__clean__COMPLETED__20250129161614533], [20250129161631399__commit__COMPLETED__20250129161715914], [20250129161716140__clean__COMPLETED__20250129161728799], [20250129161743721__commit__COMPLETED__20250129161853584], [20250129161853813__clean__COMPLETED__20250129161906394], [20250129161920102__commit__COMPLETED__20250129162003603], [20250129162003831__clean__COMPLETED__20250129162016089], [20250129162032414__commit__COMPLETED__20250129162116472], [20250129162116692__clean__COMPLETED__20250129162128885], [20250129162143135__commit__COMPLETED__20250129162227447], [20250129162227694__clean__COMPLETED__20250129162240427], [20250129162254970__commit__COMPLETED__20250129162339665], [20250129162339887__clean__COMPLETED__20250129162352996], [20250129162409151__commit__COMPLETED__20250129162507740], [20250129162507952__clean__COMPLETED__20250129162520684], [20250129162535039__commit__COMPLETED__20250129162620710], [20250129162620997__clean__COMPLETED__20250129162633544], [20250129162648950__commit__COMPLETED__20250129162734251], [20250129162734504__clean__COMPLETED__20250129162747070], [20250129162801702__commit__COMPLETED__20250129162846318], [20250129162846557__clean__COMPLETED__20250129162859009], [20250129162913781__commit__COMPLETED__20250129162958685], [20250129162958931__clean__COMPLETED__20250129163012338], [20250129163028044__commit__COMPLETED__20250129163131208], [20250129163131416__clean__COMPLETED__20250129163143895], [20250129163216136__commit__COMPLETED__20250129163302363], [20250129163302573__clean__COMPLETED__20250129163315181], [20250129163330109__commit__COMPLETED__20250129163416330], [20250129163416565__clean__COMPLETED__20250129163429376], [20250129163443601__commit__COMPLETED__20250129163530521], [20250129163530752__clean__COMPLETED__20250129163543773], [20250129163558628__commit__COMPLETED__20250129163645628], [20250129163645860__clean__COMPLETED__20250129163659022], [20250129163714790__commit__COMPLETED__20250129163815006], [20250129163815266__clean__COMPLETED__20250129163827688], [20250129163841877__commit__COMPLETED__20250129163927816], [20250129163928055__clean__COMPLETED__20250129163940604], [20250129163955187__commit__COMPLETED__20250129164041489], [20250129164041709__clean__COMPLETED__20250129164054258], [20250129164108605__commit__COMPLETED__20250129164155578], [20250129164155777__clean__COMPLETED__20250129164208329], [20250129164223761__commit__COMPLETED__20250129164311035], [20250129164311273__clean__COMPLETED__20250129164324452], [20250129164339682__commit__COMPLETED__20250129164439538], [20250129164439801__clean__COMPLETED__20250129164452443], [20250129164509223__commit__COMPLETED__20250129164557955], [20250129164558199__clean__COMPLETED__20250129164611451], [20250129164626041__commit__COMPLETED__20250129164714299], [20250129164714535__clean__COMPLETED__20250129164727304], [20250129164741841__commit__COMPLETED__20250129164829382], [20250129164829621__clean__COMPLETED__20250129164843027], [20250129164858396__commit__COMPLETED__20250129164947188], [20250129164947452__clean__COMPLETED__20250129165001042], [20250129165017514__commit__COMPLETED__20250129165119678], [20250129165119909__clean__COMPLETED__20250129165132735], [20250129165147162__commit__COMPLETED__20250129165236418], [20250129165236645__clean__COMPLETED__20250129165249467], [20250129165318819__commit__COMPLETED__20250129165424263], [20250129165424510__clean__COMPLETED__20250129165436979], [20250129165451517__commit__COMPLETED__20250129165538453], [20250129165538679__clean__COMPLETED__20250129165551141], [20250129165605776__commit__COMPLETED__20250129165653320], [20250129165653549__clean__COMPLETED__20250129165706398], [20250129165721525__commit__COMPLETED__20250129165820775], [20250129165820997__clean__COMPLETED__20250129165833124], [20250129165847434__commit__COMPLETED__20250129165933313], [20250129165933543__clean__COMPLETED__20250129165945439], [20250129165959552__commit__COMPLETED__20250129170046061], [20250129170046306__clean__COMPLETED__20250129170058517], [20250129170113374__commit__COMPLETED__20250129170201779], [20250129170202034__clean__COMPLETED__20250129170215172], [20250129170230180__commit__COMPLETED__20250129170317910], [20250129170318138__clean__COMPLETED__20250129170331031], [20250129170346440__commit__COMPLETED__20250129170450369], [20250129170450584__clean__COMPLETED__20250129170503208], [20250129170518901__commit__COMPLETED__20250129170608073], [20250129170608298__clean__COMPLETED__20250129170621025], [20250129170635622__commit__COMPLETED__20250129170728059], [20250129170728404__clean__COMPLETED__20250129170742148], [20250129170756528__commit__COMPLETED__20250129170844664], [20250129170844897__clean__COMPLETED__20250129170857538], [20250129170912422__commit__COMPLETED__20250129171001185], [20250129171001427__clean__COMPLETED__20250129171015179], [20250129171030854__commit__COMPLETED__20250129171133752], [20250129171133999__clean__COMPLETED__20250129171146687], [20250129171200333__commit__COMPLETED__20250129171248728], [20250129171248968__clean__COMPLETED__20250129171301587], [20250129171331978__commit__COMPLETED__20250129171421510], [20250129171421780__clean__COMPLETED__20250129171434902], [20250129171450176__commit__COMPLETED__20250129171540125], [20250129171540361__clean__COMPLETED__20250129171553762], [20250129171609419__commit__COMPLETED__20250129171658647], [20250129171658882__clean__COMPLETED__20250129171711880], [20250129171726915__commit__COMPLETED__20250129171829523], [20250129171829775__clean__COMPLETED__20250129171842211], [20250129171855851__commit__COMPLETED__20250129171945371], [20250129171945640__clean__COMPLETED__20250129171958578], [20250129172015176__rollback__COMPLETED__20250129172019946]]
25/01/29 17:21:04 INFO org.apache.hudi.common.table.view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
25/01/29 17:21:04 INFO org.apache.hudi.common.util.ClusteringUtils: Found 0 files in pending clustering operations
25/01/29 17:21:04 INFO org.apache.hudi.table.action.commit.UpsertPartitioner: Total Buckets: 1
25/01/29 17:21:04 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline: Checking for file exists ?gs://path/to/REDACTED/.hoodie/20250129171955324.commit.requested
25/01/29 17:21:04 INFO org.apache.hudi.utilities.streamer.HoodieStreamer: Delta Sync shutdown. Error ?false
25/01/29 17:21:04 INFO org.apache.hudi.utilities.streamer.HoodieStreamer: Ingestion completed. Has error: true
25/01/29 17:21:04 INFO org.apache.hudi.client.transaction.TransactionManager: Transaction manager closed
25/01/29 17:21:04 INFO org.apache.hudi.client.transaction.TransactionManager: Transaction manager closed
25/01/29 17:21:04 INFO org.apache.hudi.utilities.streamer.StreamSync: Shutting down embedded timeline server
25/01/29 17:21:04 ERROR org.apache.hudi.async.HoodieAsyncService: Service shutdown with error
java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
	at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:976)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1064)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1073)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:70)
	at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:44)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:114)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:103)
	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:142)
	at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:920)
	at org.apache.hudi.utilities.streamer.StreamSync.writeToSinkAndDoMetaSync(StreamSync.java:778)
	at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:450)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:767)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:618)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:683)
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:156)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:179)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:63)
	... 12 more
25/01/29 17:21:04 INFO org.apache.hudi.client.embedded.EmbeddedTimelineService: Closing Timeline server
25/01/29 17:21:04 INFO org.apache.hudi.timeline.service.TimelineService: Closing Timeline Service
25/01/29 17:21:04 INFO io.javalin.Javalin: Stopping Javalin ...
[dd.trace 2025-01-29 17:21:04:605 +0000] [spark-listener-group-shared] INFO datadog.trace.instrumentation.spark.AbstractDatadogSparkListener - Received spark application end event, finish trace on this event: false
25/01/29 17:21:04 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark@10e7d5c6{HTTP/1.1, (http/1.1)}{0.0.0.0:8095}
25/01/29 17:21:04 ERROR io.javalin.Javalin: Javalin failed to stop gracefully
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
	at org.apache.hudi.org.eclipse.jetty.server.AbstractConnector.doStop(AbstractConnector.java:373)
	at org.apache.hudi.org.eclipse.jetty.server.AbstractNetworkConnector.doStop(AbstractNetworkConnector.java:88)
	at org.apache.hudi.org.eclipse.jetty.server.ServerConnector.doStop(ServerConnector.java:246)
	at org.apache.hudi.org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:94)
	at org.apache.hudi.org.eclipse.jetty.server.Server.doStop(Server.java:459)
	at org.apache.hudi.org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:94)
	at io.javalin.Javalin.stop(Javalin.java:209)
	at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:408)
	at org.apache.hudi.client.embedded.EmbeddedTimelineService.stopForBasePath(EmbeddedTimelineService.java:249)
	at org.apache.hudi.utilities.streamer.StreamSync.close(StreamSync.java:1191)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.close(HoodieStreamer.java:936)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.onIngestionCompletes(HoodieStreamer.java:924)
	at org.apache.hudi.async.HoodieAsyncService.lambda$shutdownCallback$0(HoodieAsyncService.java:171)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
[dd.trace 2025-01-29 17:21:04:665 +0000] [main] INFO datadog.trace.instrumentation.spark.AbstractDatadogSparkListener - Finishing spark application trace
25/01/29 17:22:34 INFO com.google.cloud.dataproc.DataprocSparkPlugin: Shutting down driver plugin. metrics=[files_created=2, gcs_api_server_not_implemented_error_count=0, gcs_api_server_timeout_count=0, action_http_post_request_failures=0, op_get_list_status_result_size=39604, op_open=428, gcs_api_client_unauthorized_response_count=0, action_http_head_request_failures=0, stream_read_close_operations=428, stream_read_bytes_backwards_on_seek=644830, exception_count=108, gcs_api_total_request_count=1039, op_create=2, gcs_api_client_bad_request_count=0, op_create_non_recursive=0, gcs_api_client_gone_response_count=0, stream_write_operations=0, stream_read_operations=984, gcs_api_client_request_timeout_count=0, op_rename=0, op_get_file_status=50, stream_read_total_bytes=0, op_glob_status=0, stream_read_exceptions=0, action_http_get_request_failures=0, op_exists=0, stream_write_bytes=604818, op_xattr_list=0, stream_write_exceptions=0, gcs_api_server_unavailable_count=0, directories_created=0, files_delete_rejected=0, op_xattr_get_named=0, op_hsync=0, stream_read_operations_incomplete=875, op_delete=0, stream_read_bytes=1156034, gcs_api_client_non_found_response_count=91, gcs_api_client_requested_range_not_statisfiable_count=0, op_hflush=0, op_list_status=35, op_xattr_get_named_map=0, gcs_api_client_side_error_count=194, op_get_file_checksum=0, action_http_delete_request_failures=0, gcs_api_server_internal_error_count=0, stream_read_seek_bytes_skipped=1122974, stream_write_close_operations=1, op_list_files=0, files_deleted=0, op_mkdirs=1, gcs_api_client_rate_limit_error_count=0, action_http_put_request_failures=0, gcs_api_server_bad_gateway_count=0, stream_read_seek_backward_operations=28, gcs_api_server_side_error_count=0, action_http_patch_request_failures=0, stream_read_seek_operations=44, stream_read_seek_forward_operations=16, gcs_api_client_precondition_failed_response_count=1, directories_deleted=0, op_xattr_get_map=0, delegation_tokens_issued=0, op_create_min=49, op_delete_min=0, op_mkdirs_min=455, op_create_non_recursive_min=0, op_glob_status_min=0, op_hsync_min=0, op_xattr_get_named_min=0, op_list_status_min=28, op_xattr_get_named_map_min=0, stream_read_close_operations_min=0, stream_read_operations_min=0, stream_read_seek_operations_min=0, op_hflush_min=0, op_xattr_get_map_min=0, op_xattr_list_min=0, stream_write_operations_min=0, op_get_file_status_min=11, op_open_min=8, op_rename_min=0, delegation_tokens_issued_min=0, stream_write_close_operations_min=119, stream_read_close_operations_max=0, stream_read_operations_max=176, stream_read_seek_operations_max=0, op_hflush_max=0, op_xattr_list_max=0, op_xattr_get_map_max=0, op_xattr_get_named_max=0, op_create_non_recursive_max=0, op_glob_status_max=0, op_get_file_status_max=341, stream_write_close_operations_max=119, op_open_max=89, delegation_tokens_issued_max=0, op_mkdirs_max=455, op_rename_max=0, op_create_max=98, op_delete_max=0, op_list_status_max=178, op_xattr_get_named_map_max=0, stream_write_operations_max=0, op_hsync_max=0, op_list_status_mean=105, stream_read_close_operations_mean=0, op_open_mean=13, op_xattr_get_named_map_mean=0, op_xattr_list_mean=0, op_mkdirs_mean=455, stream_write_close_operations_mean=119, op_rename_mean=0, op_hsync_mean=0, delegation_tokens_issued_mean=0, stream_read_operations_mean=12, op_xattr_get_map_mean=0, op_create_mean=73, op_glob_status_mean=0, op_delete_mean=0, stream_read_seek_operations_mean=0, stream_write_operations_mean=0, op_create_non_recursive_mean=0, op_hflush_mean=0, op_xattr_get_named_mean=0, op_get_file_status_mean=42, stream_write_operations_duration=0, stream_read_operations_duration=11808]
[dd.trace 2025-01-29 17:22:35:137 +0000] [main] INFO datadog.trace.instrumentation.spark.AbstractDatadogSparkListener - Finishing spark application trace
Exception in thread "main" org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service was shut down with exception.
	at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67)
	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:976)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1064)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1073)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
	at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
	... 15 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:70)
	at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:44)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:114)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:103)
	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:142)
	at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:920)
	at org.apache.hudi.utilities.streamer.StreamSync.writeToSinkAndDoMetaSync(StreamSync.java:778)
	at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:450)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:767)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:618)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:683)
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:156)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:179)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:63)
	... 12 more
@rangareddy
Copy link
Contributor

Hi @sweir-thescore

Is there any multiple writers are writing to the same table/path and it is possible to share the hoodie time line to check further my end?

@sweir-thescore
Copy link
Author

Hey @rangareddy,

For part 1 of your question, I believe you are asking about this setting, hoodie.embed.timeline.server.reuse.enabled, which we use the default of false for. We do not believe there are multiple writers because we have a dispatch service that only kicks off a single dataproc job at a time for a particular ingest pipeline. If a dataproc job is already running for the pipeline, this dispatch service does not start another dataproc job. Pairing that with SINGLE_WRITER mode, we do not believe there could be multiple writers.

As for the second question, can you help define what files you are particularly looking for for the hoodie timeline? We had to sort the issue in the interrim due to production data needing to be ingested, but we may have historical, non-current hoodie metadata files if there are particular ones you are asking for.

@ad1happy2go
Copy link
Collaborator

@sweir-thescore Can you share the contents inside .hoodie. Screenshot of list dir also worked. It doesnt contain any real data, its just metadata.

@sydneyhoran
Copy link

sydneyhoran commented Feb 13, 2025

Hi @ad1happy2go - I am @sweir-thescore's teammate. We don't have any live examples as we had to repair them all in real time. But this is an example of what it looks like when we have an incomplete rollback (only the top two .rollback.requested and .rollback.inflight files) - can ignore the highlighted file. Once these 2 files are manually deleted, jobs typically succeed, but may eventually get out of sync again or throw one of the other types of errors.

Image

Every subsequent job will fail and throw an error like:

Caused by: org.apache.hudi.exception.HoodieRollbackException: Found commits after time :20240913164753886, please rollback greater commits first

org.apache.hudi.timeline.service.RequestHandler: Bad request response due to client view behind server view

HoodieMetadataException: Metadata table's deltacommits exceeded 1000: this is likely caused by a pending instant in the data table

Caused by: org.apache.hudi.exception.HoodieIOException: Failed to read footer for parquet gs://.../inserted_at_date=2025-01-19/..._20250129044440519.parquet

Caused by: java.io.FileNotFoundException: File not found: gs://.../inserted_at_date=2025-01-19/..._20250129044440519.parquet

(although these may be separate issues on their own)

Of note, we only see this error when running a GCE cluster with a dedicated driver pool. We switched back to the regular node type of GCE cluster and no longer face this issue when a job is cancelled or fails. The spark Drivers on the dedicated driver pool also required about 5x more memory (i.e. 5GB instead of 1GB on the current cluster), and still sometimes faced OOMs (that will lead to the above errors).

We are also investigating this within GCP/Dataproc and replanning our approach to how we want to architect the cluster, but these metadata/timeline issues were the primary reason why we could not switch to the new cluster configuration. So wanted to check if any thoughts here as well.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants