-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] GCP GCE Dataproc job will fail with after checking for whether or not the .commit.requested
exists
#12734
Comments
Is there any multiple writers are writing to the same table/path and it is possible to share the hoodie time line to check further my end? |
Hey @rangareddy, For part 1 of your question, I believe you are asking about this setting, hoodie.embed.timeline.server.reuse.enabled, which we use the default of false for. We do not believe there are multiple writers because we have a dispatch service that only kicks off a single dataproc job at a time for a particular ingest pipeline. If a dataproc job is already running for the pipeline, this dispatch service does not start another dataproc job. Pairing that with SINGLE_WRITER mode, we do not believe there could be multiple writers. As for the second question, can you help define what files you are particularly looking for for the hoodie timeline? We had to sort the issue in the interrim due to production data needing to be ingested, but we may have historical, non-current hoodie metadata files if there are particular ones you are asking for. |
@sweir-thescore Can you share the contents inside .hoodie. Screenshot of list dir also worked. It doesnt contain any real data, its just metadata. |
Hi @ad1happy2go - I am @sweir-thescore's teammate. We don't have any live examples as we had to repair them all in real time. But this is an example of what it looks like when we have an incomplete rollback (only the top two ![]() Every subsequent job will fail and throw an error like:
(although these may be separate issues on their own) Of note, we only see this error when running a GCE cluster with a dedicated driver pool. We switched back to the regular node type of GCE cluster and no longer face this issue when a job is cancelled or fails. The spark Drivers on the dedicated driver pool also required about 5x more memory (i.e. 5GB instead of 1GB on the current cluster), and still sometimes faced OOMs (that will lead to the above errors). We are also investigating this within GCP/Dataproc and replanning our approach to how we want to architect the cluster, but these metadata/timeline issues were the primary reason why we could not switch to the new cluster configuration. So wanted to check if any thoughts here as well. Thanks in advance! |
Tips before filing an issue
Have you gone through our FAQs? Yes, also searched for relevant Github Issues
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
org.apache.hudi.timeline.service.RequestHandler: Bad request response due to client view behind server view
common.table.timeline.HoodieActiveTimeline: Checking for file exists ?gs://REDACTED/.hoodie/20250129171955324.commit.requested
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20250129171955324
.rollback.requested
and.rollback.inflight
files only).*New GCE Cluster set up to use:
To Reproduce
Steps to reproduce the behavior:
It is unclear currently how we can reproduce this issue consistently ourselves.
Expected behavior
Hudi Timeline Client View does not fall behind the Server View and cause this problem.
Environment Description
Hudi version : 0.14.1
Spark version : 3.1.3
Hive version : 3.1.3
Hadoop version : 3.2.4
Storage (HDFS/S3/GCS..) : GCS
Running on Docker? (yes/no) : yes
Additional context
Stacktrace
The text was updated successfully, but these errors were encountered: