-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Feature Request / Improvement
For the past while, I've been doing some work related to fixing the correctness of fileSystemCatalog. Most of the problems were caused by incorrect version information being written to the versionHint file. This is because there is currently no way to guarantee that multiple file operations are atomic in the file system. If there is an interruption in the middle, we have all sorts of problems.
However, I noticed that versionHintFile is just an index file, and even without it, we can still read the latest version correctly. Since we only need to manipulate one file, we can ensure atomicity in this case.(Example: list all metadata and find max version, and using fs.rename to commit new version)
In my opinion, the introduction of the versionHint file not only doesn't greatly improve the overall efficiency of the task, but also introduces data inconsistencies. Should we remove the use of versionHintFile from the entire FileSystemCatalog?
Query engine
Spark