-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Task Description
What needs to be done:
Reduce unnecessary FSDataOutputStream#hsync to enhance append performance.
Before 1.x, the log file were allowed to be appended by different write transactions. Therefore, when we flush the data in the append handle, we need to persist the block data as much as possible to prevent the risk of data loss.
Therefore, during each flush, FSDataOutputStream#hsync is called to allow datanodes to perform data flushing to the disk, and then the synchronization is carried out to wait for the request to complete before continuing with subsequent writes.
But after 1.x, we have already prohibited appending to log files.
Therefore, a log file can be opened and written to by at most one write transaction, and the data of the entire log file should be visible after submission of the write transactions.
Thus, performing hsync each time we flush is an unnecessary operation, and the cost of this operation is extremely high. Moreover, since our write is single-threaded, it will be blocked here, and subsequent writes will not be able to proceed until the request returns.
So I suggest that performing hsync only once when closing the stream is sufficient.
Why this task is needed:
It has a significant impact on performance
Task Type
Performance optimization
Related Issues
Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.