Skip to content

Parallel execution of style-files hook causes race condition in styler .onLoad function #571

Closed
@jeancochrane

Description

@jeancochrane

Before filing a bug

  • I have installed the latest dev version of {precommit} with remotes::install_github('lorenzwalthert/precommit')
  • I have installed the latest hook revisions (update with precommit::autoupdate())
  • I have installed the latest release of the upstream Python framework pre-comit as described under the update instructions.

Describe the bug

My team sporadically sees the following error when running the style-files hook (example here):

Error: package or namespace load failed for ‘styler’:
 .onLoad failed in loadNamespace() for 'styler', details:
  call: mkdirs.default(path, mustWork = TRUE)
  error: Failed to create directory (tried 5 times), most likely because of lack of file permissions
(directory '/home/runner/.cache/R/R.cache/styler' exists but nothing beyond):
/home/runner/.cache/R/R.cache/styler/1.10.0
Execution halted

It's difficult to debug since it occurs sporadically, but it happens once every few runs.

I've seen hints of this problem elsewhere, namely in r-lib/styler#1180 (comment) and r-lib/styler#1118 (comment), but I wanted to open a dedicated issue since I had trouble finding any discussion about it.

To Reproduce

It's difficult to reproduce this bug, since it doesn't occur deterministically, but if you clone https://github.com/ccao-data/model-res-avm/ and run pre-commit run style-files --all-files a few times, you're bound to run into it.

Expected behavior

style-files should not raise an error related to R.cache directory permissions.

Additional context

I believe this error is caused by a race condition that occurs when precommit runs multiple styler processes in parallel on the same machine.

Here's my best guess at a root cause: remove_old_cache_files gets the version-specific cache path with a R.cache::getCachePath() call, which will create a directory at that path if it does not exist. Then, the remove_old_cache_files function deletes that directory if it's empty. I'm not 100% confident in this assessment, since I would need to modify the underlying R.utils code in order to confirm it, but I suspect that if process A cleans up the directory in between when the mkdirs() call in process B creates it and then checks to confirm it exists, then process B will raise an error saying that the directory wasn't able to be created. (One remaining mystery that doesn't quite fit this story: I don't totally understand why the fileAccess() call for the parent directory is returning -1, which causes the "likely because of a lack of file permissions" clause in the error message.)

While I'm not totally sure of this root cause analysis, I do feel confident that the problem is due to some sort of race condition in the getCachePath() call in remove_old_cache_files. I've never been able to reproduce the error when setting the require_serial option in my pre-commit config.

Suggested fix

It looks like a try_fetch call was added in r-lib/styler@54df59b and released in v1.10.3 in an attempt to suppress the error; I believe that we're still seeing the error in our precommit calls because precommit is pinned to 1.10.2, so one possible solution would be to just bump styler to 1.10.3 in precommit. Another option would be for us to stop running styler in parallel, which we can do by setting the require_serial option.

That being said, you may also be interested in resolving the underlying race condition instead of suppressing its error. Whatever the precise root cause of the race condition is, it's definitely coming from the mkdirs() call in getCachePath(), so I expect it could be resolved pretty easily by switching to an alternative way of constructing that directory path that doesn't create the directory, like file.path(tools::R_user_dir("R.cache", which = "cache"), "styler", styler_version).

Ultimately, I think it's up to you whether and how you want to handle this issue, I just wanted to share the results of my investigation in case it's useful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions