Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-9058] Use InProcessLockProvider for flink single writer ingesti… #12857

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cshuo
Copy link
Contributor

@cshuo cshuo commented Feb 20, 2025

…on with async table services

Change Logs

For Flink ingestion cases the concurrency mode is single writer and with async table services, use InProcessLockProvider by default.

Impact

better lock performance.

Risk level (write none, low medium or high below)

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 20, 2025
@cshuo
Copy link
Contributor Author

cshuo commented Feb 20, 2025

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@CTTY
Copy link
Contributor

CTTY commented Feb 20, 2025

Hi @cshuo , thanks for putting out the fix. I remember @danny0405 and I had the discussion on this: using Inprocess lock provider may not be safe because for flink tables services will run in a different process. We may need to be careful about it.

An alternative would be making filesystem based lock provider work on storage like s3 and will help us better. cc @yihua

@cshuo
Copy link
Contributor Author

cshuo commented Feb 21, 2025

@CTTY thks for the context.

using Inprocess lock provider may not be safe because for flink tables services will run in a different process

So in this PR, only writing pipeline with async service will use InProcess lock. When users use dedicated compaction/cluster job, usually, async table services is disabled, then the lock will fallback to FS based lock provider. WDYT

@danny0405
Copy link
Contributor

@CTTY @cshuo yeah, the best way is to solve fs lock provider issue on s3, the s3 file creation should be atomic already, @cshuo Can you take some time to research the fs lock provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants