Skip to content

Reduce unnecessary FSDataOutputStream#hsync to enhance append performance #17516

@TheR1sing3un

Description

@TheR1sing3un

Task Description

What needs to be done:
Reduce unnecessary FSDataOutputStream#hsync to enhance append performance.

Before 1.x, the log file were allowed to be appended by different write transactions. Therefore, when we flush the data in the append handle, we need to persist the block data as much as possible to prevent the risk of data loss.
Therefore, during each flush, FSDataOutputStream#hsync is called to allow datanodes to perform data flushing to the disk, and then the synchronization is carried out to wait for the request to complete before continuing with subsequent writes.

Image

But after 1.x, we have already prohibited appending to log files.
Therefore, a log file can be opened and written to by at most one write transaction, and the data of the entire log file should be visible after submission of the write transactions.
Thus, performing hsync each time we flush is an unnecessary operation, and the cost of this operation is extremely high. Moreover, since our write is single-threaded, it will be blocked here, and subsequent writes will not be able to proceed until the request returns.

So I suggest that performing hsync only once when closing the stream is sufficient.

Why this task is needed:

It has a significant impact on performance

Task Type

Performance optimization

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:devtaskDevelopment tasks and maintenance work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions