Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution of style-files hook causes race condition in styler .onLoad function #571

Closed
3 tasks done
jeancochrane opened this issue May 22, 2024 · 1 comment
Closed
3 tasks done

Comments

@jeancochrane
Copy link

jeancochrane commented May 22, 2024

Before filing a bug

  • I have installed the latest dev version of {precommit} with remotes::install_github('lorenzwalthert/precommit')
  • I have installed the latest hook revisions (update with precommit::autoupdate())
  • I have installed the latest release of the upstream Python framework pre-comit as described under the update instructions.

Describe the bug

My team sporadically sees the following error when running the style-files hook (example here):

Error: package or namespace load failed for ‘styler’:
 .onLoad failed in loadNamespace() for 'styler', details:
  call: mkdirs.default(path, mustWork = TRUE)
  error: Failed to create directory (tried 5 times), most likely because of lack of file permissions
(directory '/home/runner/.cache/R/R.cache/styler' exists but nothing beyond):
/home/runner/.cache/R/R.cache/styler/1.10.0
Execution halted

It's difficult to debug since it occurs sporadically, but it happens once every few runs.

I've seen hints of this problem elsewhere, namely in r-lib/styler#1180 (comment) and r-lib/styler#1118 (comment), but I wanted to open a dedicated issue since I had trouble finding any discussion about it.

To Reproduce

It's difficult to reproduce this bug, since it doesn't occur deterministically, but if you clone https://github.com/ccao-data/model-res-avm/ and run pre-commit run style-files --all-files a few times, you're bound to run into it.

Expected behavior

style-files should not raise an error related to R.cache directory permissions.

Additional context

I believe this error is caused by a race condition that occurs when precommit runs multiple styler processes in parallel on the same machine.

Here's my best guess at a root cause: remove_old_cache_files gets the version-specific cache path with a R.cache::getCachePath() call, which will create a directory at that path if it does not exist. Then, the remove_old_cache_files function deletes that directory if it's empty. I'm not 100% confident in this assessment, since I would need to modify the underlying R.utils code in order to confirm it, but I suspect that if process A cleans up the directory in between when the mkdirs() call in process B creates it and then checks to confirm it exists, then process B will raise an error saying that the directory wasn't able to be created. (One remaining mystery that doesn't quite fit this story: I don't totally understand why the fileAccess() call for the parent directory is returning -1, which causes the "likely because of a lack of file permissions" clause in the error message.)

While I'm not totally sure of this root cause analysis, I do feel confident that the problem is due to some sort of race condition in the getCachePath() call in remove_old_cache_files. I've never been able to reproduce the error when setting the require_serial option in my pre-commit config.

Suggested fix

It looks like a try_fetch call was added in r-lib/styler@54df59b and released in v1.10.3 in an attempt to suppress the error; I believe that we're still seeing the error in our precommit calls because precommit is pinned to 1.10.2, so one possible solution would be to just bump styler to 1.10.3 in precommit. Another option would be for us to stop running styler in parallel, which we can do by setting the require_serial option.

That being said, you may also be interested in resolving the underlying race condition instead of suppressing its error. Whatever the precise root cause of the race condition is, it's definitely coming from the mkdirs() call in getCachePath(), so I expect it could be resolved pretty easily by switching to an alternative way of constructing that directory path that doesn't create the directory, like file.path(tools::R_user_dir("R.cache", which = "cache"), "styler", styler_version).

Ultimately, I think it's up to you whether and how you want to handle this issue, I just wanted to share the results of my investigation in case it's useful!

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented May 23, 2024

Hi @jeancochrane, thanks for the detailed and well-linked issue, haven't seen anything alike for a while 👍

Let's first talk about potential fixes:

It looks like a try_fetch call was added in r-lib/styler@54df59b and released in v1.10.3 in an attempt to suppress the error; I believe that we're still seeing the error in our precommit calls because precommit is pinned to 1.10.2, so one possible solution would be to just bump styler to 1.10.3 in precommit.

Yes, everything correct. You can also use it by adding r-lib/styler@ v1.10.3 as an additional_dependencies to your .pre-commit-config.yaml. I will probably at some point release a new hook release to include the latest {styler} version.

Now about the root cause:

While I'm not totally sure of this root cause analysis, I do feel confident that the problem is due to some sort of race condition in the getCachePath() call in remove_old_cache_files

In that case, I think it would be best solved in the {R.cache} package, since {styler} is only leveraging that infrastructure. I am not sure how deep you deep you digged but the issue I opened in {R.cache} is this one: HenrikBengtsson/R.cache#51. I guess as you say, getCachePath() is potentially not designed to work in a parallel setup, so creating a reprex with the failure that only involves {R.cache} might be a good starting point. Alternatively, just implement logic to make it work in parallel is another.

I expect it could be resolved pretty easily by switching to an alternative way of constructing that directory path that doesn't create the directory, like file.path(tools::R_user_dir("R.cache", which = "cache"), "styler", styler_version)

I would not want to do that since I prefer to use the public API of {R.cache} to interact with the cache. In particular if {R.cache} changes its internals, the above code won't potentially give me the correct path to the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants