Skip to content

Should we remove the use of versionHintFile from the entire FileSystemCatalog? #10427

@BsoBird

Description

@BsoBird

Feature Request / Improvement

For the past while, I've been doing some work related to fixing the correctness of fileSystemCatalog. Most of the problems were caused by incorrect version information being written to the versionHint file. This is because there is currently no way to guarantee that multiple file operations are atomic in the file system. If there is an interruption in the middle, we have all sorts of problems.

However, I noticed that versionHintFile is just an index file, and even without it, we can still read the latest version correctly. Since we only need to manipulate one file, we can ensure atomicity in this case.(Example: list all metadata and find max version, and using fs.rename to commit new version)

In my opinion, the introduction of the versionHint file not only doesn't greatly improve the overall efficiency of the task, but also introduces data inconsistencies. Should we remove the use of versionHintFile from the entire FileSystemCatalog?

Query engine

Spark

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionalitystale

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions